Automate Document Ingestion & RAG System with Google Drive, Sheets & OpenAI
- Overview
The IngestionDocs workflow is a fully automated document ingestion and knowledge management system built with n8n. Its purpose is to continuously ingest organizational documents from Google Drive, transform them into vector embeddings using OpenAI, store them in Pinecone, and make them searchable and retrievable through an AI-powered Q&A interface.
This ensures that employees always have access to the most up-to-date knowledge base without requiring manual intervention.
- Key Objectives
Automated Ingestion** → Seamlessly process new and updated
documents from Google Drive.
Change Detection** → Track and differentiate between new, updated,
and previously processed documents.
Knowledge Base Construction** → Convert documents into embeddings
for semantic search.
AI-Powered Assistance** → Provide an intelligent Q&A system for
employees to query manuals.
Scalable & Maintainable** → Modular design using n8n, LangChain,
and Pinecone.
- Workflow Breakdown
A. Document Monitoring and Retrieval
The workflow begins with two Google Drive triggers:
File Created Trigger → Fires when a new document is
uploaded.
File Updated Trigger → Fires when an existing document is
modified.
A search operation lists the files in the designated Google
Drive folder.
Non-downloadable items (e.g., subfolders) are filtered out.
For valid files:
The file is downloaded.
A SHA256 hash is generated to uniquely identify the file's
content.
B. Record Management (Google Sheets Integration)
To keep track of ingestion states, the workflow uses a Google
Sheets--based Record Manager:
Each file entry contains:
Id** (Google Drive file ID)
Name** (file name)
hashId** (SHA256 checksum)
The workflow compares the current file's hash with the stored one:
New Document** → File not found in records → Inserted into the
Record Manager.
Already Processed** → File exists and hash matches → Skipped.
Updated Document** → File exists but hash differs → Record is
updated.
This guarantees that only new or modified content is processed, avoiding duplication.
C. Document Processing and Vectorization
Once a document is marked as new or updated:
Default Data Loader extracts its content (binary files
supported).
Pages are split into individual chunks.
Metadata such as file ID and name are attached.
Recursive Character Text Splitter divides the content into
manageable segments with overlap.
OpenAI Embeddings (text-embedding-3-large) transform each text
chunk into a semantic vector.
Pinecone Vector Store stores these vectors in the configured
index:
For new documents, embeddings are inserted into a namespace based
on the file name.
For updated documents, the namespace is cleared first, then
re-ingested with fresh embeddings.
This process builds a scalable and queryable knowledge base.
D. Knowledge Base Q&A Interface
The workflow also provides an interactive form-based user
interface:
Form Trigger** → Collects employee questions.
LangChain AI Agent**:
Receives the question.
Retrieves relevant context from Pinecone using vector similarity
search.
Processes the response using OpenAI Chat Model (gpt-4.1-mini).
Answer Formatting**:
Responses are returned in HTML format for readability.
A custom CSS theme ensures a modern, user-friendly design.
Answers may include references to page numbers when available.
This creates a self-service knowledge base assistant that employees can query in natural language.
- Technologies Used
n8n** → Orchestration of the entire workflow.
Google Drive API** → File monitoring, listing, and downloading.
Google Sheets API** → Record manager for tracking file states.
OpenAI API**:
text-embedding-3-large for semantic vector creation.
gpt-4.1-mini for conversational Q&A.
Pinecone** → Vector database for embedding storage and retrieval.
LangChain** → Document loaders, text splitters, vector store
connectors, and agent logic.
Crypto (SHA256)** → File hash generation for change detection.
Form Trigger + Form Node** → Employee-facing Q&A submission and
answer display.
Custom CSS** → Provides a modern, responsive, styled UI for the
knowledge base.
- End-to-End Data Flow
Employee uploads or updates a document → Google Drive detects
the change.
Workflow downloads and hashes the file → Ensures uniqueness and
detects modifications.
Record Manager (Google Sheets) → Decides whether to skip,
insert, or update the record.
Document Processing → Splitting + Embedding + Storing into
Pinecone.
Knowledge Base Updated → The latest version of documents is
indexed.
Employee asks a question via the web form.
AI Agent retrieves embeddings from Pinecone + uses GPT-4.1-mini
→ Generates a contextual answer.
Answer displayed in styled HTML → Delivered back to the employee
through the form interface.
- Benefits
Always Up-to-Date** → Automatically syncs documents when uploaded
or changed.
No Duplicates** → Smart hashing ensures only relevant updates are
reprocessed.
Searchable Knowledge Base** → Employees can query documents
semantically, not just by keywords.
Enhanced Productivity** → Answers are immediate, reducing time
spent browsing manuals.
Scalable** → New documents and users can be added without workflow
redesign.
✅ In summary, IngestionDocs is a robust AI-driven document ingestion and retrieval system that integrates Google Drive, Google Sheets, OpenAI, and Pinecone within n8n. It continuously builds and maintains a knowledge base of manuals while offering employees an intelligent, user-friendly Q&A assistant for fast and accurate knowledge retrieval.
Tags
Related Templates
Convert JSON Objects to Base64 Strings with File Processing
Encode JSON to Base64 String in n8n This example workflow demonstrates how to convert a JSON object into a base64-encod...
AI Agent with Ollama for current weather and wiki
This workflow template demonstrates how to create an AI-powered agent that provides users with current weather informati...
Automate Daily YouTrack Task Summaries to Discord by Assignee
Daily YouTrack In-Progress Tasks Summary to Discord by Assignee Keep your team in sync with a daily summary of tasks cu...
🔒 Please log in to import templates to n8n and favorite templates
Workflow Visualization
Loading...
Preparing workflow renderer
Comments (0)
Login to post comments