Build a company website RAG chatbot using Apify, Pinecone and Gemini

Name: Build a company website RAG chatbot using Apify, Pinecone and Gemini
Availability: InStock
Rating: 0.4 (1 reviews)
Author: Fabian Maume

AI chatbots are only as good as the data they learn from. Most large language models (LLM) rely only on their training datasets.

If you want the chatbots to know more about your business, the best is to implement a retrieval-augmented generation (RAG) pipeline to train Gemini with your website data. This is what this workflow will help you to do.

This workflow uses a scheduler to scrape a website on a regular basis using Apify; web pages are then indexed or updated in a Pinecone vector database. This allows the chatbot to provide accurate and up-to-date information. The workflow uses Google's Gemini AI for both embeddings and response generation. How does it work? This workflow is split into 2 sub-logics highlighted with green sticky notes: RAG Training logic Chatbot logic RAG training logic Use the Apify Website Content Crawler to retrieve all content from your website The Pinecone Vector Store node indexes the text chunk in a Pinecone index. The Embeddings Google Gemini node generates embeddings for each text chunk Chatbot logic The Chat Trigger node receives user questions through a chat interface. An AI Agent node handles those requests. The AI Agent node uses a Vector Store Tool node, linked to a Pinecone Vector Store node in query mode, to retrieve relevant text chunks from Pinecone based on the user's question. The AI Agent sends the retrieved information and the user's question to the Google Gemini Chat Model (gemini-pro). How to set up this template? All nodes with an orange sticky note require setup.

Get your tools set up: 1 Google Cloud Project and Vertex AI API: Create a Google Cloud project. Enable the Vertex AI API for your project. Obtain a Google AI API key from Google AI Studio

2 Get an Apify account Create an Apify account

3 Pinecone Account: Create a free account on the Pinecone website. Obtain your API key from your Pinecone dashboard. Create an index named company-website in your Pinecone project.

Configure credentials in your n8n environment for: Google Gemini(PaLM) Api (using your Google AI API key) Pinecone API (using your Pinecone API key)

Setup trigger frequency: Edit the Schedule Trigger to match the frequency at which you wish to update your RAG If you want to train your chatbot only once, you can replace it with a click trigger.

Set up the Apify node Authenticate (via OAuth or API) Set up your website URL in the JSON input

FAQ What is RAG? RAG stands for retrieval-augmented generation. It is a technique that provides an AI model (such as a large language model) with additional data. That allows the LLM to give more up-to-date and topic-specific information. What is the difference between RAG and LLM? RAG is a way to complement an LLM by giving it more up-to-date information. You can think of the LLM as the CPU processing your question, and RAG as the hard drive providing information. Do I have to use my website as training data? No. Website Content Crawler can scrape any website. So you can, in theory, use this template to build a RAG for someone else. You can even combine data from multiple websites. Can I use another model other than Gemini? In theory, yes. You could replace the Gemini node with another LLM model. If you are looking for inspiration about RAG implementation with the Ollama model, check out this template.

0

Downloads

5

Views

8.21

Quality Score

intermediate

Complexity

Category:Data Processing

Author:Fabian Maume(View Original →)

Created:3/25/2026

Updated:5/30/2026

Related Templates

Extract Title tag and Meta description from url for SEO analysis with Airtable

Extract Title tag and meta description from url for SEO analysis. How it works The workflows takes records from Airtabl...

Data Processing0 downloads

Restore your workflows from GitHub

This workflow restores all n8n instance workflows from GitHub backups using the n8n API node. It complements the Backup ...

Data Processing2 downloads

Extract Named Entities from Web Pages with Google Natural Language API

Who is this for? Content strategists analyzing web page semantic content SEO professionals conducting entity-based anal...

Build a company website RAG chatbot using Apify, Pinecone and Gemini

Tags

Related Templates

Extract Title tag and Meta description from url for SEO analysis with Airtable

Restore your workflows from GitHub

Extract Named Entities from Web Pages with Google Natural Language API

Workflow Visualization

Loading...

Comments (0)