Auto-Update Knowledge Base with Drive, LlamaIndex & Azure OpenAI Embeddings

This Workflow auto-ingests Google Drive documents, parses them with LlamaIndex, and stores Azure OpenAI embeddings in an in-memory vector store—cutting manual update time from ~30 minutes to under 2 minutes per doc.

Why Use This Workflow?

Cost Reduction: Eliminates pays monthly fee on cloud just for store knowledge

Ideal For

Knowledge Managers / Documentation Teams:** Automatically keep product docs and SOPs in sync when source files change on Google Drive. Support Teams:** Ensure the searchable KB is always up-to-date after doc edits, speeding agent onboarding and resolution time. Developer / AI Teams:** Populate an in-memory vector store for experiments, rapid prototyping, or local RAG demos.

How It Works

Trigger: Google Drive Trigger watches a specific document or folder for updates. Data Collection: The updated file is downloaded from Google Drive. Processing: The file is uploaded to LlamaIndex cloud via an HTTP Request to create a parsing job. Intelligence Layer: Workflow polls LlamaIndex job status (Wait + Monitor loop). If parsing status equals SUCCESS, the result is retrieved as markdown. Output & Delivery: Parsed markdown is loaded into LangChain's Default Data Loader, passed to Azure OpenAI embeddings (deployment "3small"), then inserted into an in-memory vector store. Storage & Logging: Vector store holds embeddings in memory (good for prototyping). Optionally persist to an external vector DB for production.

Setup Guide

Prerequisites

| Requirement | Type | Purpose | |-------------|------|---------| | n8n instance | Essential | Execute and import the workflow — use the n8n instance | | Google Drive OAuth2 | Essential | Watch and download documents from Google Drive | | LlamaIndex Cloud API | Essential | Parse and convert documents to structured markdown | | Azure OpenAI Account | Essential | Generate embeddings (deployment configured to model name "3small") | | Persistent Vector DB (e.g., Pinecone) | Optional | Persist embeddings for production-scale search |

Installation Steps

Import the workflow JSON into your n8n instance: open your n8n instance and import the file. Configure credentials: Azure OpenAI: Provide Endpoint, API Key and set deployment name. LlamaIndex API: Create an HTTP Header Auth credential in n8n. Header Name: Authorization. Header Value: Bearer YOUR_API_KEY. Google Drive OAuth2: Create OAuth 2.0 credentials in Google Cloud Console, enable Drive API, and configure the Google Drive OAuth2 credential in n8n. Update environment-specific values: Replace the workflow's Google Drive fileId with the GUID or folder ID you want to watch (do not commit public IDs). Customize settings: Polling interval (Wait node): adjust for faster or slower job status checks. Target file or folder: toggled on the Google Drive Trigger node. Embedding model: change Azure OpenAI deployment if needed. Test execution: Save changes and trigger a sample file update on Drive. Verify each node runs and the vector store receives embeddings.

Technical Details

Core Nodes

| Node | Purpose | Key Configuration | |------|---------|-------------------| | Knowledge Base Updated Trigger (Google Drive Trigger) | Triggers on file/folder changes | Set trigger type to specific file or folder; configure OAuth2 credential | | Download Knowledge Document (Google Drive) | Downloads file binary | Operation: download; ensure OAuth2 credential is selected | | Parse Document via LlamaIndex (HTTP Request) | Uploads file to LlamaIndex parsing endpoint | POST multipart/form-data to /parsing/upload; use HTTP Header Auth credential | | Monitor Document Processing (HTTP Request) | Polls parsing job status | GET /parsing/job/{{jobId}}; check status field | | Check Parsing Completion (If) | Branches on job status | Condition: {{$json.status}} equals SUCCESS | | Retrieve Parsed Content (HTTP Request) | Fetches parsed markdown result | GET /parsing/job/{{jobId}}/result/markdown | | Default Data Loader (LangChain) | Loads parsed markdown into document format | Use as document source for embeddings | | Embeddings Azure OpenAI | Generates embeddings for documents | Credentials: Azure OpenAI; Model/Deployment: 3small | | Insert Data to Store (vectorStoreInMemory) | Stores documents + embeddings | Use memory store for prototyping; switch to DB for persistence |

Workflow Logic

On Drive change, the file binary is downloaded and sent to LlamaIndex. Workflow enters a monitor loop: Monitor Document Processing fetches job status, If node checks status. If not SUCCESS, Wait node delays before re-check. When parsing completes, the workflow retrieves markdown, loads documents, creates embeddings via Azure OpenAI, and inserts data into an in-memory vector store.

Customization Options

Basic Adjustments: Poll Delay: Set Wait node (default: every minute) to balance speed vs. API quota. Target Scope: Switch the trigger from a single file to a folder to auto-handle many docs. Embedding Model: Swap Azure deployment for a different model name as needed.

Advanced Enhancements: Persistent Vector DB Integration: Replace vectorStoreInMemory with Pinecone or Milvus for production search. Notification: Add Slack or email nodes to notify when parsing completes or fails. Summarization: Add an LLM summarization step to generate chunk-level summaries.

Scaling option: Batch uploads and chunking to reduce embedding calls; use a queue (Redis or n8n queue patterns) and horizontal workers for high throughput.

Performance & Optimization

| Metric | Expected Performance | Optimization Tips | |--------|----------------------|-------------------| | Execution time (per doc) | ~10s–2min (depends on file size & LlamaIndex processing) | Chunk large docs; run embeddings in batches | | API calls (per doc) | 3–8 (upload, poll(s), retrieve, embedding calls) | Increase poll interval; consolidate requests | | Error handling | Retries via Wait loop and If checks | Add exponential backoff, failure notifications, and retry limits |

Troubleshooting

| Problem | Cause | Solution | |---------|-------|----------| | Authentication errors | Invalid/missing credentials | Reconfigure n8n Credentials; do not paste API keys directly into nodes | | File not found | Incorrect fileId or permissions | Verify Drive fileId and OAuth scopes; share file with the service account if needed | | Parsing stuck in PENDING | LlamaIndex processing delay or rate limit | Increase Wait node interval, monitor LlamaIndex dashboard, add retry limits | | Embedding failures | Model/deployment mismatch or quota limits | Confirm Azure deployment name (3small) and subscription quotas |

Created by: khmuhtadin
Category: Knowledge Management Tags: google-drive, llamaindex, azure-openai, embeddings, knowledge-base, vector-store

Need custom workflows? Contact us

0
Downloads
74
Views
8.41
Quality Score
beginner
Complexity
Author:Khairul Muhtadin(View Original →)
Created:10/3/2025
Updated:11/22/2025

🔒 Please log in to import templates to n8n and favorite templates

Workflow Visualization

Loading...

Preparing workflow renderer

Comments (0)

Login to post comments