Ingest and search Cloudflare R2 media with Gemini, Groq Whisper, and Supabase

Quick overview This workflow ingests images, PDFs, and videos from a Cloudflare R2 folder, uses Google Gemini to view pdfs, images and videos, Groq stt (Whisper) for video transcriptst - to generate searchable descriptions and tags, stores embeddings in a Supabase pgvector table.

How it works Receives a webhook request containing a Cloudflare R2 bucket and folder URL, then lists the objects in that folder. Filters to supported file types, builds public CDN URLs and timestamps, and routes each item as an image, PDF, or video. For images, calls Google Gemini with the image URL to generate structured metadata (summary, detailed description, tags, and scores). For PDFs, calls Google Gemini to analyze the document URL and return the same structured metadata. For videos, downloads each file locally, extracts representative frames with FFmpeg for Google Gemini visual analysis, extracts audio, transcribes it with Groq Whisper, and tags transcript chunks with Groq Llama. Normalizes results into a single text “content” field plus JSON metadata, generates Google Gemini embeddings, and inserts the vectors into Supabase (pgvector). Receives a separate webhook query, retrieves the most similar items from Supabase using embeddings, and returns ranked matches in the webhook response.

Setup Create a Cloudflare R2 bucket with publicly accessible object URLs, and add Cloudflare R2 credentials in n8n. Set up a Supabase project with pgvector enabled and a table named vec10, then add Supabase credentials in n8n. Add Google Gemini credentials (Google PaLM/Gemini API) for embeddings and provide an HTTP Header Auth credential for the Gemini HTTP requests. Set the GROQ_API_KEY environment variable for the Groq Whisper transcription and Llama tag extraction calls. If you enable video processing, install curl, ffmpeg, and ffprobe on the n8n host and update the local directory paths (temp root, frames directory, and video directory) in the workflow inputs. Copy the ingest webhook (/vector-ingest) and query webhook (/vector-query) URLs and configure your upstream app to send the expected JSON payloads.

Additional info Video: FFmpeg code nodes cut videos smartly into "video_frames" items and "video_transcripts" for easy handling and pgvector storage. Exposed webhook to vector query flow allows Voice Agent to find and display the full video, pulled from Cloudflare bucket, by the referenced matching video_frames or video_transcripts returned from vector query.

0
Downloads
0
Views
8.51
Quality Score
intermediate
Complexity
Author:Dave Sartori(View Original →)
Created:6/26/2026
Updated:6/26/2026

🔒 Please log in to import templates to n8n and favorite templates

Workflow Visualization

Loading...

Preparing workflow renderer

Comments (0)

Login to post comments