by Akshay
Overview This project is an AI-powered WhatsApp virtual receptionist built using n8n, designed to handle both text and voice-based customer messages automatically. The workflow integrates Google Gemini, Pinecone, and the WhatsApp Business API to provide intelligent, context-aware responses that feel natural and professional. How It Works Message Detection The workflow begins when a message arrives on WhatsApp. It identifies whether the message is text or voice and routes it accordingly. Voice Message Handling Audio messages are securely downloaded from WhatsApp. The files are converted to Base64 format and sent to the Gemini API for transcription. The transcribed text is then passed to the AI Agent for further processing. AI Agent Processing The LangChain AI Agent acts as the brain of the system. It uses: Google Gemini Chat Model** for natural language understanding and response generation. Pinecone Vector Store** to retrieve company-specific information and product data. Memory Buffer** to remember the last 20 user messages, ensuring context-aware responses. The agent also follows a set of custom communication rules — replying only in approved languages, skipping greetings, and focusing on direct, helpful, and professional responses (e.g., product recommendations, support, or guidance). Knowledge Retrieval The AI Agent connects to a Pinecone database containing detailed company data, such as product catalogs or service FAQs. Using Gemini-generated embeddings, it retrieves the most relevant information for each user query. Response Delivery Once the AI Agent prepares the response, it is instantly sent back to the user via WhatsApp, completing the conversational loop. Who It’s For This system is ideal for businesses seeking to automate their customer communication through WhatsApp. It’s especially valuable for: Product-based companies** with frequent customer inquiries. Service providers** offering 24/7 customer assistance or quote requests. SMBs** looking to scale their communication without hiring additional staff. Tech Stack & Requirements n8n** – Workflow automation and orchestration. WhatsApp Cloud API** – For sending and receiving messages. Google Gemini (PaLM)** – For LLM-based transcription and response generation. Pinecone** – Vector database for product and service knowledge retrieval. LangChain Integration** – For connecting memory, vector store, and reasoning tools. Custom Business Rules** – Configurable within the AI Agent node to manage tone, style, and workflow behavior. Key Features Handles both text and voice messages seamlessly. Responds in multiple languages, including English. Maintains conversation memory per user session. Retrieves accurate company-specific information using vector search. Fully automated, with customizable behavior for different industries or use cases. Setup Instructions 1. Prerequisites Before importing the workflow, ensure you have: An active n8n instance (self-hosted or n8n Cloud). WhatsApp Cloud API credentials** from Meta. Google Gemini API key** with model access (for chat and transcription). Pinecone API key** with a preconfigured vector index containing your company data. 2. Environment Setup Install all required credentials under Settings → Credentials in n8n. Add environment variables (if applicable) for keys like: GOOGLE_API_KEY=your_google_gemini_key PINECONE_API_KEY=your_pinecone_key WHATSAPP_ACCESS_TOKEN=your_whatsapp_token 3. Pinecone Configuration Create a Pinecone index named, for example, products-index. Upload company documents or product details as vector embeddings using Gemini or LangChain utilities. Adjust the retrieval limit in the Pinecone node settings for broader or narrower search responses. 4. WhatsApp API Configuration Set up a WhatsApp Business Account via Meta Developer Dashboard. Create a webhook endpoint URL (n8n’s public URL) to receive WhatsApp messages. Use the WhatsApp Trigger Node to capture messages in real time. 5. AI Agent Customization You can personalize how the AI behaves by editing the system prompt inside the AI Agent node: Modify tone, response length, or product focus. Add new “rules” for language preferences or conversation flow. Include links or custom text output (e.g., quotation formats, product catalog messages). 6. Handling Voice Messages Ensure your WhatsApp Business Account has media message permissions enabled. Verify the HTTP Request node that connects to the Gemini API for transcription is correctly authenticated. You can adjust the transcription model or prompt if you prefer shorter, keyword-based outputs. 7. Testing Send both text and voice messages from a test WhatsApp number. Check response time and message formatting. Use n8n’s execution logs to debug errors (especially for media downloads or API credentials). Customization Options 🧩 AI Behavior Modify the AI Agent’s system message to adapt tone and personality (e.g., sales-oriented, support-driven). Update memory length (default: last 20 messages) for longer or shorter conversations. 🌍 Multi-language Support Add or remove allowed languages in the rules section of the AI Agent node. For multilingual businesses, duplicate the AI Agent path and route messages by language detection. 📦 Industry Adaptation Swap the Pinecone dataset to suit different industries — retail, hospitality, logistics, etc. Replace product data with FAQs, customer records, or support documentation.
by DIGITAL BIZ TECH
Travel Reimbursement - OCR & Expense Extraction Workflow Overview This is a lightweight n8n workflow that accepts chat input and uploaded receipts, runs OCR, stores parsed results in Supabase, and uses an AI agent to extract structured travel expense data and compute totals. Designed for zero retention operation and fast integration. Workflow Structure Frontend:** Chat UI trigger that accepts text and file uploads. Preprocessing:** Binary normalization + per-file OCR request. Storage:** Store OCR-parsed blocks in Supabase temp_table. Core AI:** Travel reimbursement agent that extracts fields, infers missing values, and calculates totals using the Calculator tool. Output:** Agent responds to the chat with a concise expense summary and breakdowns. Chat Trigger (Frontend) Trigger node:** When chat message received public: true, allowFileUploads: true, sessionId used to tie uploads to the chat session. Custom CSS + initial messages configured for user experience. Binary Presence Check Node:** CHECK IF BINARY FILE IS PRESENT OR NOT (IF) Checks whether incoming payload contains files. If files present -> route to Split Out -> NORMALIZE binary file -> OCR (ANY OCR API) -> STORE OCR OUTPUT -> Merge. If no files -> route directly to Merge -> Travel reimbursement agent. Binary Normalization Node:** Split Out and NORMALIZE binary file (Code) Split Out extracts binary entries into a data field. NORMALIZE binary file picks the first binary key and rewrites payload to binary.data for consistent downstream shape. OCR Node:** OCR (ANY OCR API ) (HTTP Request) Sends multipart/form-data to OCR endpoint, expects JSONL or JSON with blocks. Body includes mode=single, output_type=jsonl, include_images=false. Store OCR Output Node:** STORE OCR OUTPUT (Supabase) Upserts into temp_table with session_id, parsed blocks, and file_name. Used by agent to fetch previously uploaded receipts for same session. Memory & Tooling Nodes:** Simple Memory and Simple Memory1 (memoryBufferWindow) Keep last 10 messages for session context. Node:** Calculator1 (toolCalculator) Used by agent to sum multiple charges, handle currency arithmetic and totals. Travel Reimbursement Agent (Core) Node:** Travel reimbursement agent (LangChain agent) Model: Mistral Cloud Chat Model (mistral-medium-latest) Behavior: Parse OCR blocks and non-file chat input. Extract required fields: vendor_name, category, invoice_date, checkin_date, checkout_date, time, currency, total_amount, notes, estimated. When fields are missing, infer logically and mark estimated: true. Use Calculator tool to sum totals across multiple receipts. Fetch stored OCR entries from Supabase when user asks for session summaries. Always attempt extraction; never reply with "unclear" or ask for a reupload unless user requests audit-grade precision. Final output: Clean expense table and Grand Total formatted for chat. Data Flow Summary User sends chat message plus or minus file. IF file present -> Split Out -> Normalize -> OCR -> Store OCR output -> Merge with chat payload. Travel reimbursement agent consumes merged item, extracts fields, uses Calculator tool for sums, and replies with a formatted expense summary. Integrations Used | Service | Purpose | Credential | |---------|---------|-----------| | Mistral Cloud | LLM for agent | Mistral account | | Supabase | Store parsed OCR blocks and session data | Supabase account | | OCR API | Text extraction from images/PDFs | Configurable HTTP endpoint | | n8n Core | Flow control, parsing, editing | Native | Agent System Prompt Summary > You are a Travel Expense Extraction and Calculation AI. Extract vendor, dates, currency, category, and total amounts from uploaded receipts, invoices, hotel bills, PDFs, and images. Infer values when necessary and mark them as estimated. When asked, fetch session entries from Supabase and compute totals using the Calculator tool. Respond in a concise business professional format with a category wise breakdown and a Grand Total. Never reply "unclear" or ask for a reupload unless explicitly asked. Required final response format example: Key Features Zero retention friendly design: OCR output stored only to temp_table per session. Robust extraction with inference when OCR quality is imperfect. Session aware: agent retrieves stored receipts for consolidated totals. Calculator integration for accurate numeric sums and currency handling. Configurable OCR endpoint so you can swap providers without changing logic. Setup Checklist Add Mistral Cloud and Supabase credentials. Configure OCR endpoint to accept multipart uploads and return blocks. Create temp_table schema with session_id, file, file_name. Test with single receipts, multipage PDFs, and mixed uploads. Validate agent responses and Calculator totals. Summary A practical n8n workflow for travel expense automation: accept receipts, run OCR, store parsed data per session, extract structured fields via an AI agent, compute totals, and return clean expense summaries in chat. Built for reliability and easy integration. Need Help or More Workflows? We can integrate this into your environment, tune the agent prompt, or adapt it for different OCR providers. We can help you set it up for free — from connecting credentials to deploying it live. Contact: shilpa.raju@digitalbiz.tech Website: https://www.digitalbiz.tech LinkedIn: https://www.linkedin.com/company/digital-biz-tech/ You can also DM us on LinkedIn for any help.
by Trung Tran
Try It Out, HireMind – AI-Driven Resume Intelligence Pipeline! This n8n template demonstrates how to automate resume screening and evaluation using AI to improve candidate processing and reduce manual HR effort. A smart and reliable resume screening pipeline for modern HR teams. This workflow combines Google Drive (JD & CV storage), OpenAI (GPT-4-based evaluation), Google Sheets (position mapping + result log), and Slack/SendGrid integrations for real-time communication. Automatically extract, evaluate, and track candidate applications with clarity and consistency. How it works A candidate submits their application using a form that includes name, email, CV (PDF), and a selected job role. The CV is uploaded to Google Drive for record-keeping and later reference. The Profile Analyzer Agent reads the uploaded resume, extracts structured candidate information, and transforms it into a standardized JSON format using GPT-4 and a custom output parser. The corresponding job description PDF file is automatically retrieved from a Google Sheet based on the selected job role. The HR Expert Agent evaluates the candidate profile against the job description using another GPT-4 model, generating a structured assessment that includes strengths, gaps, and an overall recommendation. The evaluation result is parsed and formatted for output. The evaluation score will be used to mark candidate as qualified or unqualified, based on that an email will be sent to applicant or the message will be send to hiring team for the next process The final evaluation result will be stored in a Google Sheet for long-term tracking and reporting. Google drive structure ├── jd # Google drive folder to store your JD (pdf) │ ├── Backend_Engineer.pdf │ ├── Azure_DevOps_Lead.pdf │ └── ... │ ├── cv # Google drive folder, where workflow upload candidate resume │ ├── John_Doe_DevOps.pdf │ ├── Jane_Smith_FullStack.pdf │ └── ... │ ├── Positions (Sample: https://docs.google.com/spreadsheets/d/1pW0muHp1NXwh2GiRvGVwGGRYCkcMR7z8NyS9wvSPYjs/edit?usp=sharing) # 📋 Mapping Table: Job Role ↔ Job Description (Link) │ └── Columns: │ - Job Role │ - Job Description File URL (PDF in jd/) │ └── Evaluation form (Google Sheet) # ✅ Final AI Evaluation Results How to use Set up credentials and integrations: Connect your OpenAI account (GPT-4 API). Enable Google Cloud APIs: Google Sheets API (for reading job roles and saving evaluation results) Google Drive API (for storing CVs and job descriptions) Set up SendGrid (to send email responses to candidates) Connect Slack (to send messages to the hiring team) Prepare your Google Drive structure: Create a root folder, then inside it create: /jd → Store all job descriptions in PDF format /cv → This is where candidate CVs will be uploaded automatically Create a Google Sheet named Positions with the following structure: | Job Role | Job Description Link | |------------------------------|----------------------------------------| | Azure DevOps Engineer | https://drive.google.com/xxx/jd1.pdf | | Full-Stack Developer (.NET) | https://drive.google.com/xxx/jd2.pdf | Update your application form: Use the built-in form, or connect your own (e.g., Typeform, Tally, Webflow, etc.) Ensure the Job Role dropdown matches exactly the roles in the Positions sheet Run the AI workflow: When a candidate submits the form: Their CV is uploaded to the /cv folder The job role is used to match the JD from /jd The Profile Analyzer Agent extracts candidate info from the CV The HR Expert Agent evaluates the candidate against the matched JD using GPT-4 Distribute and store results: Store the evaluation results in the Evaluation form Google Sheet Optionally notify your team: ✉️ Send an email to the candidate using SendGrid 💬 Send a Slack message to the hiring team with a summary and next steps Requirements OpenAI GPT-4 account for both Profile Analyzer and HR Expert Agents Google Drive account (for storing CVs and evaluation sheet) Google Sheets API credentials (for JD source and evaluation results) Need Help? Join the n8n Discord or ask in the n8n Forum! Happy Hiring! 🚀
by Davide
This workflow creates an AI-powered chatbot that generates custom songs through an interactive conversation, then uploads the results to Google Drive. This workflow transforms n8n into a complete AI music production pipeline by combining: Conversational AI Structured data validation Tool orchestration External music generation API Cloud automation It demonstrates a powerful hybrid architecture: LLM Agent + Tools + API + Storage + Async Control Flow Key Advantages 1. ✅ Fully Automated AI Music Production From idea → to lyrics → to full generated track → to cloud storage All handled automatically. 2. ✅ Conversational UX Users don’t need technical knowledge. The AI collects missing information step-by-step. 3. ✅ Smart Tool Selection The agent dynamically chooses: Songwriter tool (for original lyrics) Search tool (for existing lyrics) This makes the system adaptive and intelligent. 4. ✅ Structured & Error-Safe Design Strict JSON schema enforcement Output parsing and validation Cleanup of malformed LLM responses Reduces failure rate dramatically. 5. ✅ Asynchronous API Handling Uses webhook-based resume Handles long-running AI generation Supports multiple song outputs Scalable and production-ready. 6. ✅ Modular & Extensible The architecture allows: Switching LLM provider Changing music API Adding new tools (e.g., cover art generation) Supporting different vocal styles or languages 7. ✅ Memory-Enabled Conversations Uses buffer memory (last 10 messages) Maintains conversational context and continuity. 8. ✅ Automatic File Management Generated songs are: Automatically downloaded Properly renamed Stored in Google Drive No manual file handling required. How it Works Here's the flow: User Interaction: The workflow starts with a chat trigger that receives user messages. A "Music Producer Agent" powered by Google Gemini engages with the user conversationally to gather all necessary song parameters. Data Collection: The agent collects four essential pieces of information: Song title Musical style (genre) Lyrics (prompt) - either generated by calling the "Songwriter" tool or searched online via the "Search songs" tool Negative tags (styles/elements to avoid) Validation & Formatting: The collected data passes through an IF condition checking for valid JSON format, then a Code node parses and cleans the JSON output. A "Fix Json Structure" node ensures proper formatting with strict rules (no line breaks, no double quotes). Song Generation: The formatted data is sent to the Kie.ai API (HTTP Request node) which generates the actual music track. The workflow includes a callback URL for asynchronous processing. Wait & Retrieve: A Wait node pauses execution until the Kie.ai API sends a webhook callback with the generated songs. The "Get songs" node then retrieves the song data. Process Results: The response is split out, and a Loop Over Items node processes each generated song individually. For each song, the workflow: Downloads the audio file via HTTP request Uploads it to a specified Google Drive folder with a timestamped filename Setup steps API Credentials (3 required): Google Gemini (PaLM) API: Configure in the two Gemini Chat Model nodes Gemini Search API: Set up in the "Search songs" tool node Kie AI Bearer Token: Add in the HTTP Request nodes (Create song and Get songs) Google Drive Configuration: Authenticate Google Drive OAuth2 in the "Upload song" node Verify/modify the folder ID if needed Ensure the Drive has proper write permissions Webhook Setup: The Wait node has a webhook ID that needs to be publicly accessible Configure this URL in your Kie.ai API settings as the callback endpoint Optional Customizations: Adjust the AI agent prompts in the "Music Producer Agent" and "Songwriter" nodes Modify song generation parameters in the Kie.ai API call (styleWeight, weirdnessConstraint, etc.) Update the Google Drive folder path for song storage Change the vocal gender or other music generation settings in the "Create song" node Testing: Activate the workflow and start a chat session to test song generation with sample requests like "Write a pop song about summer" or "Find lyrics for 'Bohemian Rhapsody' and make it in rock style" 👉 Subscribe to my new YouTube channel. Here I’ll share videos and Shorts with practical tutorials and FREE templates for n8n. Need help customizing? Contact me for consulting and support or add me on Linkedin.
by Pinecone
Try it out This n8n workflow template lets you chat with your Google Drive documents (.docx, .json, .md, .txt, .pdf) using OpenAI and Pinecone Assistant. It retrieves relevant context from your files in real time so you can get accurate, context-aware answers about your proprietary data—without the need to train your own LLM. What is Pinecone Assistant? Pinecone Assistant allows you to build production-grade chat and agent-based applications quickly. It abstracts the complexities of implementing retrieval-augmented (RAG) systems by managing the chunking, embedding, storage, query planning, vector search, model orchestration, reranking for you. Prerequisites A Pinecone account and API key A GCP project with Google Drive API enabled and configured Note: When setting up the OAuth consent screen, skip steps 8-10 if running on localhost An Open AI account and API key Setup Create a Pinecone Assistant in the Pinecone Console here Name your Assistant n8n-assistant and create it in the United States region If you use a different name or region, update the related nodes to reflect these changes No need to configure a Chat model or Assistant instructions Setup your Google Drive OAuth2 API credential in n8n In the File added node -> Credential to connect with, select Create new credential Set the Client ID and Client Secret from the values generated in the prerequisites Set the OAuth Redirect URL from the n8n credential in the Google Cloud Console (instructions) Name this credential Google Drive account so that other nodes reference it Setup Pinecone API key credential in n8n In the Upload file to assistant node -> PineconeApi section, select Create new credential Paste in your Pinecone API key in the API Key field Setup Pinecone MCP Bearer auth credential in n8n In the Pinecone Assistant node -> Credential for Bearer Auth section, select Create new credential Set the Bearer Token field to your Pinecone API key used in the previous step Setup the Open AI credential in n8n In the OpenAI Chat Model node -> Credential to connect with, select Create new credential Set the API Key field to your OpenAI API key Add your files to a Drive folder named n8n-pinecone-demo in the root of your My Drive If you use a different folder name, you'll need to update the Google Drive triggers to reflect that change Activate the workflow or test it with a manual execution to ingest the documents Chat with your docs! Ideas for customizing this workflow Customize the System Message on the AI Agent node to your use case to indicate what kind of knowledge is stored in Pinecone Assistant Change the top_k value of results returned from Assistant by adding "and should set a top_k of 3" to the System Message to help manage token consumption Configure the Context Window Length in the Conversation Memory node Swap out the Conversation Memory node for one that is more persistent Make the chat node publicly available or create your own chat interface that calls the chat webhook URL. Need help? You can find help by asking in the Pinecone Discord community, asking on the Pinecone Forum, or filing an issue on this repo.
by Don Jayamaha Jr
A fully autonomous, HTX Spot Market AI Agent (Huobi AI Agent) built using GPT-4o and Telegram. This workflow is the primary interface, orchestrating all internal reasoning, trading logic, and output formatting. ⚙️ Core Features 🧠 LLM-Powered Intelligence: Built on GPT-4o with advanced reasoning ⏱️ Multi-Timeframe Support: 15m, 1h, 4h, and 1d indicator logic 🧩 Self-Contained Multi-Agent Workflow: No external subflows required 🧮 Real-Time HTX Market Data: Live spot price, volume, 24h stats, and order book 📲 Telegram Bot Integration: Interact via chat or schedule 🔄 Autonomous Runs: Support for webhook, schedule, or Telegram triggers 📥 Input Examples | User Input | Agent Action | | --------------- | --------------------------------------------- | | btc | Returns 15m + 1h analysis for BTC | | eth 4h | Returns 4-hour swing data for ETH | | bnbusdt today | Full day snapshot with technicals + 24h stats | 🖥️ Telegram Output Sample 📊 BTC/USDT Market Summary 💰 Price: $62,400 📉 24h Stats: High $63,020 | Low $60,780 | Volume: 89,000 BTC 📈 1h Indicators: • RSI: 68.1 → Overbought • MACD: Bearish crossover • BB: Tight squeeze forming • ADX: 26.5 → Strengthening trend 📉 Support: $60,200 📈 Resistance: $63,800 🛠️ Setup Instructions Create your Telegram Bot using @BotFather Add Bot Token in n8n Telegram credentials Add your GPT-4o or OpenAI-compatible key under HTTP credentials in n8n (Optional) Add your HTX API credentials if expanding to authenticated endpoints Deploy this main workflow using: ✅ Webhook (HTTP Request Trigger) ✅ Telegram messages ✅ Cron / Scheduled automation 🎥 Live Demo 🧠 Internal Architecture | Component | Role | | ------------------ | -------------------------------------------------------- | | 🔄 Telegram Trigger | Entry point for external or manual signal | | 🧠 GPT-4o | Symbol + timeframe extraction + strategy generation | | 📊 Data Collector | Internal tools fetch price, indicators, order book, etc. | | 🧮 Reasoning Layer | Merges everything into a trading signal summary | | 💬 Telegram Output | Sends formatted HTML report via Telegram | 📌 Use Case Examples | Scenario | Outcome | | -------------------------------------- | ------------------------------------------------------- | | Auto-run every 4 hours | Sends new HTX signal summary to Telegram | | Human requests “eth 1h” | Bot replies with real-time 1h chart-based summary | | System-wide trigger from another agent | Invokes webhook and returns response to parent workflow | 🧾 Licensing & Attribution © 2025 Treasurium Capital Limited Company Architecture, prompts, and trade report structure are IP-protected. No unauthorized rebranding permitted. 🔗 For support: Don Jayamaha – LinkedIn
by Taiwo Hassan
SecretOps, DevSecOps Real-Time Repos Secret Leak Remediation SecretOps is an n8n security automation workflow that monitors Git push events, detects high-risk secrets in commits, and automatically responds in real time. Unlike typical scanners that only notify, SecretOps acts immediately: Revokes leaked AWS access keys Creates incident tickets in Jira Alerts the security team via Slack Uses AI as a Security Analyst to decide the correct response This workflow demonstrates how n8n can function as a lightweight SOAR (Security Orchestration, Automation, and Response) system for DevOps teams. 🚨 The Problem Developers sometimes commit secrets such as: AWS access keys Payment processor API keys (Paystack / Stripe) Database connection URLs These leaks can result in: Cloud infrastructure takeover Financial theft Full database compromise Most tools detect and notify. SecretOps detects and reacts. 🧠 How It Works 1) Git Push Webhook SecretOps listens to repository push events from GitHub/GitLab. 2) Deterministic Secret Detection (Code Node) A Code node scans changed files and extracts only high-impact secrets: AKIA... → AWS access keys sk_live_, pk_test_ → payment processor keys postgres://, mongodb://, mysql://, redis:// → database URLs 3) AI Security Analyst An AI node receives detected items and decides the correct action: REVOKE_AWS_KEY PAYMENT_PROCESSOR_KEY_ALERT ROTATE_DB_PASSWORD IGNORE_KEY It also generates ready-to-use Jira ticket content and Slack alert messages. 4) Automated Response (Switch) | Action | Automated Response | |--------------------------------|-----------------------------------------------------------------------| | REVOKE_AWS_KEY | Disable key in AWS IAM → Create Jira ticket → Send Slack alert | | PAYMENT_PROCESSOR_KEY_ALERT | Create Jira ticket → Send Slack alert | | ROTATE_DB_PASSWORD | Create Jira ticket → Send Slack alert | | IGNORE_KEY | End workflow | ⚡ What Makes This Unique Immediate containment of AWS key leaks (set to Inactive automatically) AI used for decision-making, not detection Built-in incident workflow for developers and security teams Minimal false positives by focusing only on real, high-risk secrets Shows n8n as a practical DevSecOps automation tool 🧩 Requirements GitHub or GitLab webhook AWS credentials with IAM permissions Jira project access Slack webhook or bot token n8n with AI node enabled 🛡️ Real-World Impact SecretOps turns secret leaks from a silent vulnerability into an immediate, traceable, and automated incident response — reducing the window of exploitation from hours to seconds. Ideal for DevOps, security teams, and engineering organizations that want proactive protection without complex security tooling.
by Khairul Muhtadin
Transform raw investment memorandums and financial decks into comprehensive, professional Due Diligence (DD) PDF reports. This workflow automates document parsing via LlamaParse, enriches internal data with real-time web intelligence using Decodo, and utilizes an AI Agent to synthesize structured financial analysis, risk assessments, and investment theses. Why Use This Workflow? Time Savings:** Reduces initial deal screening and report generation from 6–8 hours of manual analysis to under 5 minutes. Accuracy & Depth:** Employs a multi-query RAG (Retrieval-Augmented Generation) strategy that cross-references internal deal documents with verified external web evidence. Cost Reduction:** Eliminates the need for expensive junior analyst hours for preliminary data gathering and document summarization. Scalability:** Effortlessly processes multiple deals simultaneously, maintaining a consistent reporting standard across your entire pipeline. Ideal For Venture Capital & Private Equity:** Rapidly assessing incoming pitch decks and CIMs (Confidential Information Memorandums). M&A Advisory Teams:** Automating the creation of standardized target company profiles and risk summaries. Investment Analysts:** Generating structured data from unstructured PDFs to feed into internal valuation models. How It Works Trigger: A webhook receives document uploads (PDF, DOCX, PPTX) via a custom portal or API. Data Collection: LlamaParse converts complex document layouts into clean Markdown, preserving tables and financial structures. Processing: The workflow generates a unique "Deal ID" based on filenames to ensure data isolation and implements a caching layer via Pinecone to avoid redundant parsing. Intelligence Layer: Web Enrichment: The workflow derives the target company name and uses Decodo to scrape official websites for "About" and "Commercial Risk" data. Multi-Query RAG: An OpenAI-powered agent executes six specific retrieval queries (Financials, Risks, Business Model, etc.) to gather evidence from all sources. Output & Delivery: Analysis is mapped to a structured template, rendered into a professional HTML report, and converted to a high-quality PDF using Puppeteer. Storage & Logging: The final report is uploaded to Cloudflare R2, and a public, secure URL is returned to the user instantly. Setup Guide Prerequisites | Requirement | Type | Purpose | | --- | --- | --- | | n8n instance | Essential | Core automation and workflow orchestration | | LlamaIndex Cloud | Essential | High-accuracy document parsing (LlamaParse) | | Pinecone | Essential | Vector database for document and web evidence storage | | OpenAI API | Essential | LLM for embeddings and expert analysis (Embedding Small & GPT-5.2) | | Decodo API | Essential | Real-time web searching and markdown scraping | | R2 Bucket | Essential | Secure storage for the generated PDF reports | Installation Steps Import the JSON file to your n8n instance. Configure credentials: OpenAI: Add your API key for embeddings and the Chat Model. Pinecone: Enter your API Key and Index name (default: poc). LlamaIndex: Add your API key under Header Auth (Authorization: Bearer YOUR_KEY). Decodo: Set up your Decodo API credentials for web search and scraping. AWS S3: Configure your bucket name and access keys. Update environment-specific values: In the "Build Public Report URL" node, update the baseUrl to match your S3 bucket's public endpoint or CDN. Test execution: Send a POST request to the webhook URL with a binary file (e.g., a Pitch Deck) to verify the end-to-end generation. Technical Details Core Nodes | Node | Purpose | Key Configuration | | --- | --- | --- | | LlamaParse (HTTP) | Document Conversion | Uses the /parsing/upload and /job/result endpoints for high-fidelity markdown | | Pinecone Vector Store | Context Storage | Implements namespace-based isolation using the unique dealId | | Decodo Search/Scrape | Web Intelligence | Dynamically identifies the official domain and extracts corporate metadata | | AI Agent | Strategic Analysis | Configured with a "Senior Investment Analyst" system prompt and 6-step retrieval logic | | Puppeteer | PDF Generation | Renders the styled HTML report into a print-ready A4 PDF | Workflow Logic The workflow uses a Multi-Query Retrieval strategy. Instead of asking one generic question, the AI Agent is forced to perform six distinct searches against the vector database (Revenue History, Key Risks, etc.). This ensures that even if a document is 100 pages long, the AI doesn't "miss" critical financial tables or risk disclosures buried in the text. Customization Options Basic Adjustments Report Styling:** Edit the "Render DD Report HTML" node to match your firm's branding (logo, colors, fonts). Analysis Scope:** Modify the AI Agent's prompt to include specific metrics (e.g., "ESG Score" or "Technical Debt Assessment"). Advanced Enhancements Slack/Email Integration:** Instead of just an S3 link, have n8n send the PDF directly to a #new-deals Slack channel. CRM Sync:** Automatically create a new record in HubSpot or Salesforce with the structured JSON output attached. Troubleshooting | Problem | Cause | Solution | | --- | --- | --- | | Parsing Timeout | File is too large for synchronous processing | Increase the "Wait" node duration or check LlamaParse job limits | | Low Analysis Quality | Insufficient context in documents | Ensure documents are text-based PDFs (not scans) or enable OCR in LlamaParse | | PDF Layout Broken | CSS incompatibility in Puppeteer | Simplify CSS in the HTML node; avoid complex Flexbox/Grid if Puppeteer version is older | Use Case Examples Scenario 1: Venture Capital Deal Screening Challenge: A VC associate receives 20 pitch decks a day and spends hours manually summarizing company profiles. Solution: This workflow parses the deck and web-scrapes the startup's site to verify claims. Result: The associate receives a 3-page PDF summary for every deck, allowing them to reject or move forward in seconds. Scenario 2: Private Equity Due Diligence Challenge: Analyzing a 150-page CIM (Information Memorandum) for specific financial "red flags." Solution: The AI Agent is programmed to specifically hunt for customer concentration and margin fluctuations. Result: Consistent risk identification across all deals, regardless of which analyst is assigned to the project. Created by: Khmuhtadin Category: Business Intelligence | Tags: Decodo, AI, RAG, Due Diligence, LlamaIndex, Pinecone Need custom workflows? Contact us Connect with the creator: Portfolio • Store • LinkedIn • Medium • Threads
by Emir Belkahia
This workflow helps Customer Success Managers and customer success professionals quickly gather intelligence on clients or prospects by analyzing their recent LinkedIn activity via a simple Slack command. Who's it for CSMs, Account Managers, and Sales professionals who need fast, structured insights about a person's LinkedIn presence before a call, meeting, or outreach. What it does (and doesn't do) ✅ It DOES: Fetch recent LinkedIn posts from any profile Analyze posting frequency and cadence patterns Identify top themes and focus areas Extract recent highlights with context Generate a clean HTML report sent via email ❌ It DOESN'T: Access private/non-public LinkedIn content Provide real-time updates (it's a snapshot) Replace actual researches when needed Think of it as: Your personal LinkedIn research assistant that turns a name into actionable intelligence in under a minute. How it works Slack command - Type /check-linkedin [Full Name] in Slack Name validation - AI verifies you provided a full name (not just "John") Profile discovery - Finds the correct LinkedIn profile via Apify Content scraping - Pulls their recent posts (last 20) AI analysis - GPT-4.1 analyzes posting patterns, topics, and highlights Report generation - Creates a formatted HTML email report Email delivery - Sends the intelligence brief to your inbox Set up steps Setup time: ~15 minutes Create or use your existing Slack app and add a Slash Command (it can be done here https://api.slack.com/apps) Configure the webhook URL in your Slack app Connect credentials: Slack OAuth Apify API OpenAI API Gmail OAuth Update the email recipient in "Send report via Email" node Test with a known LinkedIn profile Requirements Slack workspace (with app installation permissions) Apify account with credits OpenAI API key (GPT-4.1 access) Gmail account Apify actors: LinkedIn Profile Finder LinkedIn Post Scraper Cost estimation ~$0.05-0.09 per profile check. You could research 11-20 people for $1. ⚠️ Cost Disclaimer: The costs displayed above are indicative only and may vary significantly depending on which n8n actors you select. Some actors incur monthly charges—for example, one of the two actors used in this workflow costs $35/month. So, I recommend using this actor only when there's a clear business need for it. For cost optimization, consider switching to alternative actors that can deliver similar / simpler functionality at a lower cost. If you plan to use this workflow extensively, I strongly suggest performing a budget assessment and evaluating other actor options to maximize cost efficiency. The workflow uses GPT-4.1-mini for lightweight classification and GPT-4.1 for the heavy analysis to balance quality and cost. Known Limitations Common names have limited accuracy: Very common names (e.g., "John Smith") often fail to identify the correct person accurately. An improved version could support company name in the slash command as an additional input to help narrow down results and improve first-try matching accuracy. 💡 Pro tips Check before important meetings: Run this 15-30 minutes before a call. The email report gives you conversation starters and context about what they care about. Batch your research: If you have multiple clients or prospects, queue them up. Just remember each lookup costs ~$0.05-0.09. Watch your Apify credits: The LinkedIn scrapers are the main cost driver. Monitor your Apify usage if you're doing high volume. Don't spam the same profile: LinkedIn may rate-limit. Space out repeat checks on the same person by at least a few hours. Review the "Quick Scan" section first: The email report starts with key stats and top focus areas. Perfect for a 30-second pre-call prep. What to do after the workflow runs Check your email - Report arrives in 30-90 seconds Review the report - Latest post date, cadence, and top themes Read Recent Activity Summary - High-level overview of their content Dive into Detailed Analysis - Two main topics with keywords and rationale Use it strategically: Reference their recent posts in your outreach Ask about topics they're clearly passionate about Tailor your pitch to their demonstrated interests Avoid generic "saw you on LinkedIn" messages Questions or Feedback? 📧 emir.belkahia@gmail.com 💼 linkedin.com/in/emirbelkahia
by Pawan
This template sets up a scheduled automation that scrapes the latest news from The Hindu website, uses a Google Gemini AI Agent to filter and analyze the content for relevance to the Competitive Exams like UPSC Civil Services Examination (CSE) syllabus, and compiles a structured daily digest directly into a Google Sheet. It saves hours of manual reading and note-taking by providing concise summaries, subject categorization, and explicit UPSC importance notes. Who’s it for This workflow is essential for: UPSC/CSE Aspirants who require a curated, focused, and systematic daily current affairs digest. Coaching Institutes aiming to instantly generate structured, high-quality study material for their students. Educators and Content Creators focused on Governance, Economy, International Relations, and Science & Technology. How it works / What it does This workflow runs automatically every morning (scheduled for 7 AM by default) to generate a ready-to-study current affairs document. Scraping: The Schedule Trigger fires an HTTP Request to fetch the latest news links from The Hindu's front page. Data Curation: The HTML and Code in JavaScript nodes work together to extract and pair every article URL with its title. Content Retrieval: For each identified link, a second HTTP Request node fetches the entire article body. AI Analysis and Filtering: The AI Agent uses a detailed prompt and the Google Gemini Chat Model to perform two critical tasks: Filter: It filters out all irrelevant articles (e.g., sports results, local crime) to keep only the 5-6 most important UPSC-relevant pieces (Polity, Economy, IR, etc.). Analyze: For the selected articles, it generates a Brief Summary, identifies the Main Subject, and clearly articulates Why it is Important for the UPSC Exam. Storage: The AI Agent calls the integrated Google Sheets Tool to automatically append the structured, analyzed data into your designated Google Sheet, creating your daily ready-made notes. Requirements To deploy this workflow, you need: n8n Account: (Cloud or self-hosted). Google Gemini API Key: For connecting the Google Gemini Chat Model and powering the AI Agent. Google Sheets Credentials: For reading/writing the final compiled digest. Target Google Sheet: A spreadsheet with the following columns: Date, URL, Subject, Brief Summary, and What is Important. How to set up Credentials Setup:** Connect your Google Gemini and Google Sheets accounts via the n8n Credentials Manager. Google Sheet Linking:* In the *Append row in sheet and Append row in sheet in Google Sheets1 nodes, replace the **placeholder IDs and GIDs with the actual ID and sheet name of your dedicated UPSC notes spreadsheet. Scheduling:* Adjust the time in the *Schedule Trigger: Daily at 7 AM node** if you want the daily analysis to run at a different hour. AI Customization (Optional):* You can refine the System Message in the *AI Agent: Filter & Analyze UPSC News node** to focus the analysis on specific exam phases (e.g., Prelims only) or adjust the priority of subjects.
by Ravi Patel
Quick Overview This workflow manually runs to read a list of webpage URLs from Google Sheets, scrape each page with ScrapingBee, and use Google Gemini to extract structured product data from screenshots with an HTML fallback, then append the results back into a Google Sheets sheet. How it works Runs when you manually trigger the workflow. Reads the list of URLs to scrape from a Google Sheets spreadsheet. Fetches a full-page screenshot for each URL using the ScrapingBee API. Sends the screenshot (and the URL for context) to a Google Gemini model to extract product details into a structured JSON format, calling a ScrapingBee HTML fetch tool when screenshot extraction is incomplete. Converts any fetched HTML to Markdown and returns it to the Gemini agent to complete the extraction. Splits the extracted product array into individual items and appends them as new rows in the Google Sheets “Results” sheet. Setup Create a Google Sheets service account connection in n8n and set the target spreadsheet and the “List of URLs” and “Results” sheet selections. Add a Google Gemini (PaLM) API credential in n8n and ensure the selected model (gemini-1.5-pro-latest) is available for your account. Add your ScrapingBee API key in both ScrapingBee HTTP requests (screenshot and HTML) and confirm the target URLs are reachable from your n8n environment. Ensure the “Results” sheet columns match the structured output fields (for example: product_title, product_price, product_brand, promo, promo_percentage/promo_percent) or update the output schema and column mappings accordingly.
by Siddharth Gupta
Quick overview This workflow scans HTML files in a Google Drive folder, extracts and stores page text in Postgres, generates local vector embeddings with Ollama, and uses PGVector similarity searches to produce CSV reports that flag semantically duplicate website pages. How it works Starts manually and clears the existing PGVector embeddings table and the scraped page text table in Postgres. Lists files in a specified Google Drive folder, filters to the target documents, and processes them in batches. Downloads each HTML file from Google Drive, extracts the main body text, cleans it, and upserts the results into a Postgres table for scraped pages. Reads the scraped page text back from Postgres in batches, splits it into overlapping chunks, and attaches page metadata (sheet_id, file_name, file_url) to each chunk. Generates embeddings locally with Ollama and inserts the chunk vectors and metadata into Postgres (PGVector), deduplicating already-processed pages. Builds an HNSW index in Postgres, computes chunk-to-chunk similarity matches and a pairwise page report, and exports the results as a CSV file. Computes page-level centroid embeddings, finds highly similar page pairs, and exports a page-level duplicate report as a CSV file. Setup Add Google Drive OAuth2 credentials and set the Google Drive folder URL/ID used to scan for your HTML files. Add Postgres credentials for a database with the pgvector extension enabled and permissions to create/alter tables and indexes (including HNSW indexes). Add an Ollama credential and ensure the embedding model mxbai-embed-large:latest is available on your Ollama instance. Confirm your source files are HTML documents and that the workflow’s text extraction and similarity thresholds match your content and desired duplicate sensitivity. Requirements Working instance of n8n, either self-hosted or on the cloud. Remember, this workflow can be computationally expensive. Google Drive API (with OAuth setup in n8n credentials section) Ollama (for open source models) or any Embedding model API PostgreSQL with PGVector or any other vector database PgAdmin (for PostgreSQL) or your interface to access database tables via SQL for troubleshooting (optional). Additional info Limitations and Enhancements: Physical system memory mxbai-embed-large Running through Ollama is free and private, but the embedding generation speed depends entirely on your hardware. The more system memory you have, the more data you can process in batches in the loop node. Similarity threshold and boilerplate content The cosine distance used in this workflow is 0.15 for chunk-level matching. And 0.05 (similarity above 95%) of the threshold is used for page-level centroid matching. This is only the starting point. Once you have the data, and especially if your data has more noise, you might need to tweak these thresholds for better matching. This workflow needs HTML files to extract text This workflow doesn't crawl a website or fetch pages by entering a URL. You need to download HTML files (rendered or source) for consumption. Use parallel processing and Cloud APIs Two sub-processes take the most time: Downloading HTML files from Google Drive Creating vector embeddings If you can use parallel processing in n8n and execute these sub-processes in parallel, the process will be done much faster. Additionally, if you can use cloud APIs for embedding, it may save some you some processing time as well. Use efficient SQL queries Since I am from a non-tech background and not a coder, I used a mix of Gemini, Perplexity and Claude to create SQL codes for this workflow. If you're better at it, you can run computationally efficient queries that would help you achieve better results with less computation expense and time.