Build a local RAG chatbot with Ollama, Qwen, BGE-M3 and Postgres PGVector

Name: Build a local RAG chatbot with Ollama, Qwen, BGE-M3 and Postgres PGVector
Availability: InStock
Rating: 0.4 (1 reviews)
Author: Wassim Abid

Build a fully local RAG chatbot using Ollama that works without tool calling — ideal for smaller open-source models like Qwen that don't support native function calls. This template lets you run a private, self-hosted AI assistant with retrieval-augmented generation using only your own hardware.

How it works

A Webhook receives the user's chat message A small classifier LLM (Qwen 7B) analyzes the input and decides: is this small talk, or a real question that needs the knowledge base? For small talk, a dedicated AI agent responds conversationally with chat memory For real questions, the classifier generates focused sub-queries, which are sent through a loop-based RAG pipeline: Each sub-query is embedded using BGE-M3 and matched against a Postgres PGVector store Results are filtered by a relevance score threshold (>0.4) Chunks are aggregated and deduplicated across all sub-queries An Answer Generator agent (Qwen 14B) produces a sourced answer using a strict 3-step format: short answer → sources → follow-up question Both paths use Postgres-backed chat memory for multi-turn conversations A post-processing step removes <think> tags that some reasoning models produce

Set up steps

Install Ollama and pull the required models: ollama pull qwen2.5:7b (classifier + small talk) ollama pull qwen3:14b (answer generation) ollama pull bge-m3 (embeddings) Set up PostgreSQL with the pgvector extension enabled Create your vector store — ingest your documents into the PGVector store using BGE-M3 embeddings (you can use n8n's built-in document loaders for this) Configure credentials in n8n: Ollama connection (default: http://localhost:11434) PostgreSQL connection for both chat memory and vector store Customize the webhook path and connect it to your frontend or API client Optional: Adjust the relevance score threshold, swap models for larger/smaller ones, or modify the system prompts to match your use case

0

Downloads

1

Views

8.38

Quality Score

intermediate

Complexity

Category:Data Processing

Author:Wassim Abid(View Original →)

Created:4/16/2026

Updated:6/27/2026

Related Templates

Restore your workflows from GitHub

This workflow restores all n8n instance workflows from GitHub backups using the n8n API node. It complements the Backup ...

Data Processing2 downloads

Build a Restaurant Voice Assistant with VAPI and PostgreSQL for Bookings & Orders

This n8n template demonstrates how to create a comprehensive voice-powered restaurant assistant that handles table reser...

Data Processing6 downloads

Extract Named Entities from Web Pages with Google Natural Language API

Who is this for? Content strategists analyzing web page semantic content SEO professionals conducting entity-based anal...

Build a local RAG chatbot with Ollama, Qwen, BGE-M3 and Postgres PGVector

Tags

Related Templates

Restore your workflows from GitHub

Build a Restaurant Voice Assistant with VAPI and PostgreSQL for Bookings & Orders

Extract Named Entities from Web Pages with Google Natural Language API

Workflow Visualization

Loading...

Comments (0)