📥 Transform Google Drive Documents into Vector Embeddings
Automatically convert documents from Google Drive into vector embeddings using OpenAI, LangChain, and PGVector — fully automated through n8n.
⚙️ What It Does
This workflow monitors a Google Drive folder for new files, supports multiple file types (PDF, TXT, JSON), and processes them into vector embeddings using OpenAI’s text-embedding-3-small model. These embeddings are stored in a Postgres database using the PGVector extension, making them query-ready for semantic search or RAG-based AI agents.
After successful processing, files are moved to a separate “vectorized” folder to avoid duplication.
💡 Use Cases
Powering Retrieval-Augmented Generation (RAG) AI agents
Semantic search across private documents
AI assistant knowledge ingestion
Automated document pipelines for indexing or classification
🧠 Workflow Highlights
Trigger Options:** Manual or Scheduled (3 AM daily by default)
Supported File Types:** PDF, TXT, JSON
Embedding Stack:** LangChain Text Splitter, OpenAI Embeddings, PGVector
Deduplication:** Files are moved after processing
License:** CC BY-SA 4.0
Author:** AlexK1919
🛠 What You’ll Need
Google Drive OAuth2** credentials (connected to Search Folder, Download File, and Move File nodes)
OpenAI API Key** (used in the Embeddings OpenAI node)
Postgres + PGVector** database (connected in the Postgres PGVector Store node)
🔧 Step-by-Step Setup Instructions
Create Google OAuth2 credentials in n8n and connect them to all Google Drive nodes. Set your source folder ID in the Search Folder node — this is where incoming files are placed. Set your processed folder ID in the Move File node — files will be moved here after vectorization. Ensure you have a PGVector-enabled Postgres instance and input the table name and collection in the Postgres PGVector Store node. Add your OpenAI credentials to the Embeddings OpenAI node and select text-embedding-3-small. Optional: Activate the Schedule Trigger node to run daily or configure your own schedule. Run manually by triggering When clicking ‘Test workflow’ for on-demand ingestion.
🧩 Customization Tips
Want to support more file types or enhance the pipeline?
Add new extractors**: Use Extract from File with other formats like DOCX, Markdown, or HTML. Refine logic by file type**: The Switch node routes files to the correct extraction method based on MIME type (application/pdf, text/plain, application/json). Pre-process with OCR**: Add an OCR step before extraction to handle scanned PDFs or images. Add filters**: Enhance the Search Folder or Switch node logic to skip specific files or folders.
📄 License
This workflow is available under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license. You are free to use, adapt, and share this workflow for non-commercial purposes under the terms of this license.
Full license details: https://creativecommons.org/licenses/by-nc-sa/4.0/
Related Templates
Restore your workflows from GitHub
This workflow restores all n8n instance workflows from GitHub backups using the n8n API node. It complements the Backup ...
Verify Linkedin Company Page by Domain with Airtop
Automating LinkedIn Company URL Verification Use Case This automation verifies that a given LinkedIn URL actually belo...
USDT And TRC20 Wallet Tracker API Workflow for n8n
Overview This n8n workflow is specifically designed to monitor USDT TRC20 transactions within a specified wallet. It u...
🔒 Please log in to import templates to n8n and favorite templates
Workflow Visualization
Loading...
Preparing workflow renderer
Comments (0)
Login to post comments