Sync Google Drive documents to Pinecone RAG with Google Gemini embeddings
Quick overview This workflow runs on a schedule to sync files from a Google Drive folder into a Pinecone vector index for RAG, extracting text from PDFs, XLSX, Google Docs, and spreadsheets, generating embeddings with Google Gemini, and tracking file state in a Google Sheets log to handle updates and deletions.
How it works Runs on a schedule and fetches the current file list from a target Google Drive folder and the existing file log from Google Sheets. Compares Google Drive files with the Google Sheets log to detect new/updated files to ingest and files that were deleted from Drive. For new or updated files, deletes any existing vectors in Pinecone for the file ID, downloads the file from Google Drive, and routes it by MIME type. Extracts text from PDFs, XLSX/Google Sheets, and plain text/Google Docs files and maps the extracted content with file metadata (file ID, name, modified time, and MIME type). Chunks the document text, generates embeddings with Google Gemini, and inserts the resulting vectors and metadata into a Pinecone index. Appends or updates the Google Sheets log with the latest file metadata, and for deleted Drive files it deletes matching vectors in Pinecone and removes the corresponding log rows.
Setup Connect Google Drive OAuth2 credentials and set the folder ID to the Drive folder you want to sync. Connect Google Sheets OAuth2 credentials and set the spreadsheet/sheet used as the sync log (it must include at least file_id, name, and modifiedTIme columns). Create or select a Pinecone index (for example, gdrive-rag), add Pinecone API credentials, and ensure the Pinecone delete endpoint and API key header are configured correctly. Add Google Gemini (PaLM) API credentials for the embeddings model used by the workflow.
Related Templates
Restore your workflows from GitHub
This workflow restores all n8n instance workflows from GitHub backups using the n8n API node. It complements the Backup ...
Build a Restaurant Voice Assistant with VAPI and PostgreSQL for Bookings & Orders
This n8n template demonstrates how to create a comprehensive voice-powered restaurant assistant that handles table reser...
Extract Named Entities from Web Pages with Google Natural Language API
Who is this for? Content strategists analyzing web page semantic content SEO professionals conducting entity-based anal...
🔒 Please log in to import templates to n8n and favorite templates
Workflow Visualization
Loading...
Preparing workflow renderer
Comments (0)
Login to post comments