Scrape and ingest web pages into a Pinecone RAG stack with Firecrawl and OpenAI

Name: Scrape and ingest web pages into a Pinecone RAG stack with Firecrawl and OpenAI
Availability: InStock
Rating: 0.4 (1 reviews)
Author: Firecrawl

What this does

Receives a URL via webhook, uses Firecrawl to scrape the page into clean markdown, and stores it as vector embeddings in Pinecone. A visual, self-hosted ingestion pipeline for RAG knowledge bases. Adding a new source is as simple as sending a URL.

The second part of the workflow exposes a chat interface where an AI Agent queries the stored knowledge base to answer questions, with Cohere reranking for better retrieval quality.

How it works

Part 1: Ingestion Pipeline Webhook receives a POST request with a url field Verify URL validates and normalizes the domain, returning a 422 error if invalid Firecrawl /scrape fetches the page and converts it to clean markdown Embeddings OpenAI generates 1536-dimensional vector embeddings from the scraped content Default Data Loader attaches the source URL as metadata Pinecone Vector Store inserts the content and embeddings into the index Respond to Webhook confirms how many items were added

Part 2: RAG Chat Agent Chat trigger receives a user question AI Agent (OpenRouter / Claude Sonnet) queries the Pinecone vector store Cohere Reranker improves retrieval quality before the agent responds Agent answers based solely on the ingested knowledge base

🔥 Firecrawl
🌲 Pinecone
🧠 OpenAI Embeddings
🤖 OpenRouter (Claude Sonnet)
🎯 Cohere Reranker

Webhook usage

Send a POST request to the webhook URL:

curl -X POST https://your-n8n-instance/webhook/your-id
-H "Content-Type: application/json"
-d '{"url": "firecrawl.dev"}'

Pinecone setup

Your Pinecone index must be configured with 1536 dimensions to match the OpenAI text-embedding-3-small model output. See the sticky note inside the workflow for the exact index settings.

Requirements Firecrawl API key OpenAI API key (for embeddings) OpenRouter API key (for the chat agent) Cohere API key (for reranking) Pinecone account with a properly configured index

0

Downloads

1

Views

8.38

Quality Score

intermediate

Complexity

Category:Data Processing

Author:Firecrawl(View Original →)

Created:3/15/2026

Updated:6/26/2026

Related Templates

Restore your workflows from GitHub

This workflow restores all n8n instance workflows from GitHub backups using the n8n API node. It complements the Backup ...

Data Processing2 downloads

Build a Restaurant Voice Assistant with VAPI and PostgreSQL for Bookings & Orders

This n8n template demonstrates how to create a comprehensive voice-powered restaurant assistant that handles table reser...

Data Processing6 downloads

Extract Named Entities from Web Pages with Google Natural Language API

Who is this for? Content strategists analyzing web page semantic content SEO professionals conducting entity-based anal...

Scrape and ingest web pages into a Pinecone RAG stack with Firecrawl and OpenAI

Tags

Related Templates

Restore your workflows from GitHub

Build a Restaurant Voice Assistant with VAPI and PostgreSQL for Bookings & Orders

Extract Named Entities from Web Pages with Google Natural Language API

Workflow Visualization

Loading...

Comments (0)