Build a PDF Search System with Mistral OCR and Weaviate DB

Build a PDF to Vector RAG System: Mistral OCR, Weaviate Database and MCP Server

A comprehensive RAG (Retrieval-Augmented Generation) workflow that transforms PDF documents into searchable vector embeddings using advanced AI technologies.

šŸš€ Features

PDF Document Processing**: Upload and extract text from PDF files using Mistral's OCR capabilities Vector Database Storage**: Store document embeddings in Weaviate vector database for efficient retrieval AI-Powered Search**: Search through documents using semantic similarity with Cohere embeddings MCP Server Integration**: Expose the knowledge base as an AI tool through MCP (Model Context Protocol) Document Metadata**: Basic document metadata including filename, content, source, and upload timestamp Text Chunking**: Automatic text splitting for optimal vector storage and retrieval

šŸ› ļø Technologies Used

Mistral AI**: OCR and text extraction from PDF documents Weaviate**: Vector database for storing and retrieving document embeddings Cohere**: Multilingual embeddings and reranking for improved search accuracy MCP (Model Context Protocol): AI tool integration for external AI workflows n8n: Workflow automation and orchestration

šŸ“‹ Prerequisites

Before using this template, you'll need to set up the following credentials:

Mistral Cloud API: For PDF text extraction Weaviate API: For vector database operations Cohere API: For embeddings and reranking HTTP Header Auth: For MCP server authentication

šŸ”§ Setup Instructions

Import the template into your n8n instance Configure credentials for all required services Set up Weaviate collection named "KnowledgeDocuments" Configure webhook paths for the MCP server and form trigger Test the workflow by uploading a PDF document

šŸ“Š Workflow Overview

PDF Upload → Text Extraction → Document Processing → Vector Storage → AI Search ↓ ↓ ↓ ↓ ↓ Form Trigger → Mistral OCR → Prepare Metadata → Weaviate DB → MCP Server

šŸŽÆ Use Cases

Knowledge Base Management**: Create searchable repositories of company documents Research Documentation**: Process and search through research papers and reports Legal Document Search**: Index and search through legal documents and contracts Technical Documentation**: Make technical manuals and guides searchable Academic Literature**: Process and search through academic papers and publications

āš ļø Important Notes

Model Consistency**: Use the same embedding model for both storage and retrieval Collection Management**: Ensure your Weaviate collection is properly configured API Limits**: Be aware of rate limits for Mistral, Cohere, and Weaviate APIs Document Size**: Consider chunking large documents for optimal processing

šŸ”— Related Resources

n8n Documentation Weaviate Documentation Mistral AI Documentation Cohere Documentation MCP Protocol Documentation

šŸ“ License

This template is provided as-is for educational and commercial use.

0
Downloads
1
Views
8.18
Quality Score
intermediate
Complexity
Created:9/10/2025
Updated:11/17/2025

šŸ”’ Please log in to import templates to n8n and favorite templates

Workflow Visualization

Loading...

Preparing workflow renderer

Comments (0)

Login to post comments