Multi-Format Document Processing for RAG Chatbot with Google Drive & Supabase
This n8n workflow is the data ingestion pipeline for the "RAG System V2" chatbot. It automatically monitors a specific Google Drive folder for new files, processes them based on their type, and inserts their content into a Supabase vector database to make it searchable for the RAG agent.
Key Features & Workflow:
Google Drive Trigger: The workflow starts automatically when a new file is created in a designated folder (named "DOCUMENTS" in this template).
Smart File Handling: A Switch node routes the file based on its MIME type (e.g., PDF, Excel, Google Doc, Word Doc) for correct processing.
Multi-Format Extraction:
PDF: Text is extracted directly using the Extract PDF Text node.
Google Docs: Files are downloaded and converted to plain text (text/plain) and processed by the Extract from Text File node.
Excel: Data is extracted, aggregated, and concatenated into a single text block for embedding.
Word (.doc/.docx): Word files are automatically converted into Google Docs format using an HTTP Request. This newly created Google Doc will then trigger the entire workflow again, ensuring it's processed correctly.
Chunking & Metadata Enrichment: The extracted text is split into manageable chunks using the Recursive Character Text Splitter (set to 2000-character chunks). The Enhanced Default Data Loader then enriches these chunks with crucial metadata from the original file, such as file_name, creator, and created_at.
Vectorization & Storage: Finally, the workflow uses OpenAI Embeddings to create vector representations of the text chunks and inserts them into the Supabase Vector Store.
Related Templates
Send structured logs to BetterStack from any workflow using HTTP Request
Send structured logs to BetterStack from any workflow using HTTP Request Who is this for? This workflow is perfect for...
Provide latest euro exchange rates from European Central Bank via Webhook
What is this workflow doing? This simple workflow is pulling the latest Euro foreign exchange reference rates from the E...
Convert Tour PDFs to Vector Database using Google Drive, LangChain & OpenAI
🧩 Workflow: Process Tour PDF from Google Drive to Pinecone Vector DB with OpenAI Embeddings Overview This workflow au...
🔒 Please log in to import templates to n8n and favorite templates
Workflow Visualization
Loading...
Preparing workflow renderer
Comments (0)
Login to post comments