by Juan Sanchez
🧾 Personal Invoice Processor This N8N workflow automates the extraction and organization of personal invoices in Colombia received via Gmail. It includes the following key steps: 🔁 Flow Summary Email Trigger Polls Gmail every 30 minutes for emails with .zip attachments (assumed to contain invoices). Expects ZIP file following DIAN standards. ZIP File Handling Extracts all files. Filters only PDF and XML files for processing. Data Extraction & Processing Uses LangChain Agent + OpenAI (GPT-4o-mini) to extract: Tipo de documento (Factura / Nota Crédito) Número de factura Fecha de emisión (YYYY-MM-DD) NIT emisor y receptor (sin dígito de verificación) Razón social del emisor Subtotal, IVA, Total CUFE Resumen de compra (max 20 words, formatted sentence) Validation Ensures Total = Subtotal + IVA using a calculator node. Storage Uploads the original PDF to Google Drive. Renames the file to: YYYY-MM-DD-NUMERO_FACTURA.pdf. Inserts or updates invoice details in Google Sheets using a unique Key (NIT_Emisor + Numero_Factura) to prevent duplication. > ⚙️ Designed for personal use with minimal latency tolerance and high automation reliability.
by Davide
This workflow implements a Retrieval-Augmented Generation (RAG) system that: Stores vectorized documents in Qdrant, Retrieves relevant content based on user input, Generates AI answers using Google Gemini, Automatically cites the document sources (from Google Drive). Workflow Steps Create Qdrant Collection A REST API node creates a new collection in Qdrant with specified vector size (1536) and cosine similarity. Load Files from Google Drive The workflow lists all files in a Google Drive folder, downloads them as plain text, and loops through each. Text Preprocessing & Embedding Documents are split into chunks (500 characters, with 50-character overlap). Embeddings are created using OpenAI embeddings (text-embedding-3-small assumed). Metadata (file name and ID) is attached to each chunk. Store in Qdrant All vectors, along with metadata, are inserted into the Qdrant collection. Chat Input & Retrieval When a chat message is received, the question is embedded and matched against Qdrant. Top 5 relevant document chunks are retrieved. A Gemini model is used to generate the answer based on those sources. Source Aggregation & Response File IDs and names are deduplicated. The AI response is combined with a list of cited documents (filenames). Final output: AI Response Sources: ["Document1", "Document2"] Main Advantages End-to-end Automation**: From document ingestion to chat response generation, fully automated with no manual steps. Scalable Knowledge Base**: Easy to expand by simply adding files to the Google Drive folder. Traceable Responses**: Each answer includes its source files, increasing transparency and trustworthiness. Modular Design**: Each step (embedding, storage, retrieval, response) is isolated and reusable. Multi-provider AI**: Combines OpenAI (for embeddings) and Google Gemini (for chat), optimizing performance and flexibility. Secure & Customizable**: Uses API credentials and configurable chunk size, collection name, etc. How It Works Document Processing & Vectorization The workflow retrieves documents from a specified Google Drive folder. Each file is downloaded, split into chunks (using a recursive text splitter), and converted into embeddings via OpenAI. The embeddings, along with metadata (file ID and name), are stored in a Qdrant vector database under the collection negozio-emporio-verde. Query Handling & Response Generation When a user submits a chat message, the workflow: Embeds the query using OpenAI. Retrieves the top 5 relevant document chunks from Qdrant. Uses Google Gemini to generate a response based on the retrieved context. Aggregates and deduplicates the source file names from the retrieved chunks. The final output includes both the AI-generated response and a list of source documents (e.g., Sources: ["FAQ.pdf", "Policy.txt"]). Set Up Steps Configure Qdrant Collection Replace QDRANTURL and COLLECTION in the "Create collection" HTTP node to initialize the Qdrant collection with: Vector size: 1536 (OpenAI embedding dimension). Distance metric: Cosine. Ensure the "Clear collection" node is configured to reset the collection if needed. Google Drive & OpenAI Integration Link the Google Drive node to the target folder (Test Negozio in this example). Verify OpenAI and Google Gemini API credentials are correctly set in their respective nodes. Metadata & Output Customization Adjust the "Aggregate" and "Response" nodes if additional metadata fields are needed. Modify the "Output" node to format the response (e.g., changing Sources: {{...}} to match your preferred style). Testing Trigger the workflow manually to test document ingestion. Use the chat interface to verify responses include accurate source attribution. Note: Replace placeholder values (e.g., QDRANTURL) with actual endpoints before deployment. Need help customizing? Contact me for consulting and support or add me on Linkedin.
by JHH
LLM/RAG Kaggle Development Assistant An on-premises, domain-specific AI assistant for Kaggle (tested on binary disaster-tweet classification), combining LLM, an n8n workflow engine, and Qdrant-backed Retrieval-Augmented Generation (RAG). Deploy via containerized starter kit. Needs high end GPU support or patience. Initial chat should contain guidelines on what to to produce and the challenge guidelines. Features Coding Assistance** • "Real"-time Python code recommendations, debugging help, and data-science best practices • Multi-turn conversational context Workflow Automation** • n8n orchestration for LLM calls, document ingestion, and external API integrations Retrieval-Augmented Generation (RAG)** • Qdrant vector-database for competition-specific document lookup • On-demand retrieval of Kaggle competition guidelines, tutorials, and notebooks after convertion to HTML and ingestion into RAG entirly On-Premises for Privacy** • Locally hosted LLM (via Ollama) – no external code or data transfer ALIENTELLIGENCE/contentsummarizer:latest for summarizing qwen3:8b for chat and coding mxbai-embed-large:latest for embedding • GPU acceleration required Based on: https://n8n.io/workflows/2339 breakdown documents into study notes using templating mistralai and qdrant/
by ConvertAPI
Who is this for? For developers and organizations that need to convert web page to PDF. What problem is this workflow solving? The web page conversion to PDF problem. What this workflow does Converts web page to PDF. Stores the PDF file in the local file system. How to customize this workflow to your needs Open the HTTP Request node. Adjust the URL parameter (all endpoints can be found here). Add your secret to the Query Auth account parameter. Please create a ConvertAPI account to get an authentication secret. Change the parameter url to the webpage you want to convert to pdf Optionally, additional Body Parameters can be added for the converter.
by Miquel Colomer
Do you want to avoid communication problems when launching phone calls? This workflow verifies landline and mobile phone numbers using the uProc Get Parsed and validated phone tool with worldwide coverage. You need to add your credentials (Email and API Key - real -) located at Integration section to n8n. Node "Create Phone Item" can be replaced by any other supported service with phone values, like databases (MySQL, Postgres), or Typeform. The "uProc" node returns the next fields per every parsed and validated phone number: country_prefix: contains the international country phone prefix number. country_code: contains the 2-digit ISO country code of the phone number. local_number: contains the phone number without international prefix. formatted: contains a formatted version of the phone number, according to country detected. valid: detects if the phone number has a valid format and prefix. type: the phone number type (mobile, landline, or something else). "If" node checks if the phone number is valid. You can use the result to mark invalid phone numbers in your database or discard them from future telemarketing campaigns.
by Lukas Kunhardt
Intelligently Segment PDFs by Table of Contents This workflow empowers you to automatically process PDF documents, intelligently identify or generate a hierarchical Table of Contents (ToC), and then segment the entire document's content based on these ToC headings. It effectively breaks down a large PDF into its constituent sections, each paired with its corresponding heading and hierarchical level. Why It's Useful Unlock the true structure of your PDFs for granular access and advanced processing: AI Agent Tool:** A key use case is to provide this workflow as a tool to an AI agent. The agent can then use the segmented output to "read" and navigate to specific sections of a document to answer questions, extract information, or perform tasks with much greater accuracy and efficiency. Targeted Content Extraction:** Programmatically pull out specific chapters or subsections for focused analysis, summarization, reporting, or repurposing content. Enhanced RAG Systems:** Improve your Retrieval Augmented Generation (RAG) pipelines by feeding them well-defined, contextually relevant document sections instead of entire, monolithic PDFs. This leads to more precise AI-generated responses. Modular Document Processing:** Process different parts of a document using distinct logic in subsequent n8n workflows by acting on individual sections. Data Preparation:** Seamlessly convert lengthy PDFs into a structured format where each section (including its heading, level, and content in multiple formats) becomes a distinct, manageable item. How It Works Ingestion & Advanced Parsing: The workflow ingests a PDF (via a provided URL or a pre-set one for manual runs). It then utilizes Chunkr.ai to perform Optical Character Recognition (OCR) and parse the document into detailed structural elements, extracting text, HTML, and Markdown for each segment. AI-Powered Table of Contents Generation: A Google Gemini AI model analyzes the initial pages of the document (where a ToC often resides) along with section headers extracted by Chunkr as a fallback. This allows it to construct an accurate, hierarchical Table of Contents in a structured JSON format, even if the PDF lacks an explicit ToC or if it's poorly formatted. Precise Content Segmentation: Sophisticated custom code then meticulously maps the AI-generated ToC headings to their corresponding content within the parsed document from Chunkr. It intelligently determines the precise start and end of each section. Structured & Flexible Output: The primary output provides each identified section as an individual n8n item. Each item includes the heading text, its hierarchical level (e.g., 1, 1.1, 2), and the full content of that section in Text, HTML, and Markdown formats. Optionally, the workflow can also reconstruct the entire document into a single, navigable HTML file or a clean Markdown file. What You Need To run this workflow, you'll need: Input PDF:** When triggered by another workflow: A URL pointing to the PDF document. When triggered manually: The workflow uses a pre-configured sample PDF from Google Drive for demonstration (this can be customized). Chunkr.ai API Key:** Required for the initial parsing and OCR of the PDF document. You'll need to insert this into the relevant HTTP Request nodes. Google Gemini API Credentials:** Necessary for the AI model to intelligently generate the Table of Contents. This should be configured in the Google Gemini Chat Model nodes. Outputs The workflow primarily generates: Individual Document Sections:** A series of n8n items. Each item represents a distinct section of the PDF and contains: heading: The text of the section heading. headingLevel: The hierarchical level of the heading (e.g., 1 for H1, 2 for H2). sectionText: The plain text content of the section. sectionHTML: The HTML content of the section. sectionMarkdown: The Markdown content of the section. Alternatively, you can configure the workflow to output: Full Reconstructed Document:** A single HTML file representing the entire processed document. A single Markdown file representing the entire processed document. This workflow is ideal for anyone looking to deconstruct PDFs into meaningful, manageable parts for advanced automation, AI integration, or detailed content analysis.
by Harshil Agrawal
This is an example that gets the logo, icon, and information of a company and stores it in Airtbale. You can set the values that you want to store in the Set node. If you want to store the data in a different database (Google Sheet, Postgres, MongoDB, etc.) replace the Airtable node with that node. You can refer to the documentation to learn how to build this workflow from scratch.
by Harshil Agrawal
This example workflow allows you to create, update, and get a document in Google Cloud Firestore. The workflow uses the Set node to set the data, however, you might receive data from a different source. Add the node that receives the data before the Set node and set the values you want to insert in a document, in the Set node. Also, update the Columns/ attributes fields in the Google Cloud Firestore node.
by Harshil Agrawal
Based on your use case, you might want to trigger a workflow if new data gets added to your database. This workflow allows you to send a message to Mattermost when new data gets added in Google Sheets. The Interval node triggers the workflow every 45 minutes. You can modify the timing based on your use case. You can even use the Cron node to trigger the workflow. If you wish to fetch new Tweets from Twitter, replace the Google Sheet node with the respective node. Update the Function node accordingly.
by Harshil Agrawal
This workflow allows you to receive updates about the positiong of the ISS and add it to a table in TimescaleDB. Cron node: The Cron node triggers the workflow every minute. You can configure the time based on your use-case. HTTP Request node: This node makes an HTTP Request to an API that returns the position of the ISS. Based on your use-case you may want to fetch data from a different URL. Enter the URL in the URL field. Set node: In the Set node we set the information that we need in the workflow. Since we only need the timestamp, latitude, and longitude we set this in the node. If you need other information, you can set them in this node. TimescaleDB node: This node stores the information in a table named iss. You can use a different table as well.
by Shashikanth
Source code, I maintain this worflow here. Usage Guide This workflow backs up all workflows as JSON files named in the [workflow_name].json format. Steps Create GitHub Repository Skip this step if using an existing repository. Add GitHub Credentials In Credentials, add the GitHub credential for the repository owner. Download and Import Workflow Import this workflow into n8n. Set Global Values In the Globals node, set the following: repo.owner: GitHub username of the repository owner. repo.name: Name of the repository for backups. repo.path: Path to the folder within the repository where workflows will be saved. Configure GitHub Nodes Edit each GitHub node in the workflow to use the added credentials. Workflow Logic Each workflow run handles files based on their status: New Workflow If a workflow is new, create a new file in the repository. Unchanged Workflow If the workflow is unchanged, skip to the next item. Changed Workflow If a workflow has changes, update the corresponding file in the repository. Current Limitations / Needs work Name Change of Workflows If a workflow is renamed or deleted in n8n, the old file remains in the repository. Deleted Workflows Deleted workflows in n8n are not removed from the repository.
by Ludwig
How It Works: • Scrapes company review data from Glassdoor using ScrapingBee. • Extracts demographic-based ratings using AI-powered text analysis. • Calculates workplace disparities with statistical measures like z-scores, effect sizes, and p-values. • Generates visualizations (scatter plots, bar charts) to highlight patterns of discrimination or bias. Example Visualizations: Set Up Steps: Estimated time: ~20 minutes. • Replace ScrapingBee and OpenAI credentials with your own. • Input the company name you want to analyze (best results with large U.S.-based organizations). • Run the workflow and review the AI-generated insights and visual reports. This workflow empowers users to identify potential workplace discrimination trends, helping advocate for greater equity and accountability. Additional Credit: Wes Medford For algorithms and inspiration