Auto-Update Knowledge Base with Drive, LlamaIndex & Azure OpenAI Embeddings
This Workflow auto-ingests Google Drive documents, parses them with LlamaIndex, and stores Azure OpenAI embeddings in an in-memory vector store—cutting manual update time from ~30 minutes to under 2 minutes per doc.
Why Use This Workflow?
Cost Reduction: Eliminates pays monthly fee on cloud just for store knowledge
Ideal For
Knowledge Managers / Documentation Teams:** Automatically keep product docs and SOPs in sync when source files change on Google Drive. Support Teams:** Ensure the searchable KB is always up-to-date after doc edits, speeding agent onboarding and resolution time. Developer / AI Teams:** Populate an in-memory vector store for experiments, rapid prototyping, or local RAG demos.
How It Works
Trigger: Google Drive Trigger watches a specific document or folder for updates. Data Collection: The updated file is downloaded from Google Drive. Processing: The file is uploaded to LlamaIndex cloud via an HTTP Request to create a parsing job. Intelligence Layer: Workflow polls LlamaIndex job status (Wait + Monitor loop). If parsing status equals SUCCESS, the result is retrieved as markdown. Output & Delivery: Parsed markdown is loaded into LangChain's Default Data Loader, passed to Azure OpenAI embeddings (deployment "3small"), then inserted into an in-memory vector store. Storage & Logging: Vector store holds embeddings in memory (good for prototyping). Optionally persist to an external vector DB for production.
Setup Guide
Prerequisites
| Requirement | Type | Purpose | |-------------|------|---------| | n8n instance | Essential | Execute and import the workflow — use the n8n instance | | Google Drive OAuth2 | Essential | Watch and download documents from Google Drive | | LlamaIndex Cloud API | Essential | Parse and convert documents to structured markdown | | Azure OpenAI Account | Essential | Generate embeddings (deployment configured to model name "3small") | | Persistent Vector DB (e.g., Pinecone) | Optional | Persist embeddings for production-scale search |
Installation Steps
Import the workflow JSON into your n8n instance: open your n8n instance and import the file. Configure credentials: Azure OpenAI: Provide Endpoint, API Key and set deployment name. LlamaIndex API: Create an HTTP Header Auth credential in n8n. Header Name: Authorization. Header Value: Bearer YOUR_API_KEY. Google Drive OAuth2: Create OAuth 2.0 credentials in Google Cloud Console, enable Drive API, and configure the Google Drive OAuth2 credential in n8n. Update environment-specific values: Replace the workflow's Google Drive fileId with the GUID or folder ID you want to watch (do not commit public IDs). Customize settings: Polling interval (Wait node): adjust for faster or slower job status checks. Target file or folder: toggled on the Google Drive Trigger node. Embedding model: change Azure OpenAI deployment if needed. Test execution: Save changes and trigger a sample file update on Drive. Verify each node runs and the vector store receives embeddings.
Technical Details
Core Nodes
| Node | Purpose | Key Configuration | |------|---------|-------------------| | Knowledge Base Updated Trigger (Google Drive Trigger) | Triggers on file/folder changes | Set trigger type to specific file or folder; configure OAuth2 credential | | Download Knowledge Document (Google Drive) | Downloads file binary | Operation: download; ensure OAuth2 credential is selected | | Parse Document via LlamaIndex (HTTP Request) | Uploads file to LlamaIndex parsing endpoint | POST multipart/form-data to /parsing/upload; use HTTP Header Auth credential | | Monitor Document Processing (HTTP Request) | Polls parsing job status | GET /parsing/job/{{jobId}}; check status field | | Check Parsing Completion (If) | Branches on job status | Condition: {{$json.status}} equals SUCCESS | | Retrieve Parsed Content (HTTP Request) | Fetches parsed markdown result | GET /parsing/job/{{jobId}}/result/markdown | | Default Data Loader (LangChain) | Loads parsed markdown into document format | Use as document source for embeddings | | Embeddings Azure OpenAI | Generates embeddings for documents | Credentials: Azure OpenAI; Model/Deployment: 3small | | Insert Data to Store (vectorStoreInMemory) | Stores documents + embeddings | Use memory store for prototyping; switch to DB for persistence |
Workflow Logic
On Drive change, the file binary is downloaded and sent to LlamaIndex. Workflow enters a monitor loop: Monitor Document Processing fetches job status, If node checks status. If not SUCCESS, Wait node delays before re-check. When parsing completes, the workflow retrieves markdown, loads documents, creates embeddings via Azure OpenAI, and inserts data into an in-memory vector store.
Customization Options
Basic Adjustments: Poll Delay: Set Wait node (default: every minute) to balance speed vs. API quota. Target Scope: Switch the trigger from a single file to a folder to auto-handle many docs. Embedding Model: Swap Azure deployment for a different model name as needed.
Advanced Enhancements: Persistent Vector DB Integration: Replace vectorStoreInMemory with Pinecone or Milvus for production search. Notification: Add Slack or email nodes to notify when parsing completes or fails. Summarization: Add an LLM summarization step to generate chunk-level summaries.
Scaling option: Batch uploads and chunking to reduce embedding calls; use a queue (Redis or n8n queue patterns) and horizontal workers for high throughput.
Performance & Optimization
| Metric | Expected Performance | Optimization Tips | |--------|----------------------|-------------------| | Execution time (per doc) | ~10s–2min (depends on file size & LlamaIndex processing) | Chunk large docs; run embeddings in batches | | API calls (per doc) | 3–8 (upload, poll(s), retrieve, embedding calls) | Increase poll interval; consolidate requests | | Error handling | Retries via Wait loop and If checks | Add exponential backoff, failure notifications, and retry limits |
Troubleshooting
| Problem | Cause | Solution | |---------|-------|----------| | Authentication errors | Invalid/missing credentials | Reconfigure n8n Credentials; do not paste API keys directly into nodes | | File not found | Incorrect fileId or permissions | Verify Drive fileId and OAuth scopes; share file with the service account if needed | | Parsing stuck in PENDING | LlamaIndex processing delay or rate limit | Increase Wait node interval, monitor LlamaIndex dashboard, add retry limits | | Embedding failures | Model/deployment mismatch or quota limits | Confirm Azure deployment name (3small) and subscription quotas |
Created by: khmuhtadin
Category: Knowledge Management
Tags: google-drive, llamaindex, azure-openai, embeddings, knowledge-base, vector-store
Need custom workflows? Contact us
Related Templates
USDT And TRC20 Wallet Tracker API Workflow for n8n
Overview This n8n workflow is specifically designed to monitor USDT TRC20 transactions within a specified wallet. It u...
Automate Daily Keyword Research with Google Sheets, Suggest API & Custom Search
Who's it for This workflow is perfect for SEO specialists, marketers, bloggers, and content creators who want to automa...
Bulk Automated Google Drive Files Sharing and Direct Download Link Generation
This N8N workflow automates the process of sharing files from Google Drive. It includes OAuth2 authentication, batch pro...
🔒 Please log in to import templates to n8n and favorite templates
Workflow Visualization
Loading...
Preparing workflow renderer
Comments (0)
Login to post comments