by DIGITAL BIZ TECH
Travel Reimbursement - OCR & Expense Extraction Workflow Overview This is a lightweight n8n workflow that accepts chat input and uploaded receipts, runs OCR, stores parsed results in Supabase, and uses an AI agent to extract structured travel expense data and compute totals. Designed for zero retention operation and fast integration. Workflow Structure Frontend:** Chat UI trigger that accepts text and file uploads. Preprocessing:** Binary normalization + per-file OCR request. Storage:** Store OCR-parsed blocks in Supabase temp_table. Core AI:** Travel reimbursement agent that extracts fields, infers missing values, and calculates totals using the Calculator tool. Output:** Agent responds to the chat with a concise expense summary and breakdowns. Chat Trigger (Frontend) Trigger node:** When chat message received public: true, allowFileUploads: true, sessionId used to tie uploads to the chat session. Custom CSS + initial messages configured for user experience. Binary Presence Check Node:** CHECK IF BINARY FILE IS PRESENT OR NOT (IF) Checks whether incoming payload contains files. If files present -> route to Split Out -> NORMALIZE binary file -> OCR (ANY OCR API) -> STORE OCR OUTPUT -> Merge. If no files -> route directly to Merge -> Travel reimbursement agent. Binary Normalization Node:** Split Out and NORMALIZE binary file (Code) Split Out extracts binary entries into a data field. NORMALIZE binary file picks the first binary key and rewrites payload to binary.data for consistent downstream shape. OCR Node:** OCR (ANY OCR API ) (HTTP Request) Sends multipart/form-data to OCR endpoint, expects JSONL or JSON with blocks. Body includes mode=single, output_type=jsonl, include_images=false. Store OCR Output Node:** STORE OCR OUTPUT (Supabase) Upserts into temp_table with session_id, parsed blocks, and file_name. Used by agent to fetch previously uploaded receipts for same session. Memory & Tooling Nodes:** Simple Memory and Simple Memory1 (memoryBufferWindow) Keep last 10 messages for session context. Node:** Calculator1 (toolCalculator) Used by agent to sum multiple charges, handle currency arithmetic and totals. Travel Reimbursement Agent (Core) Node:** Travel reimbursement agent (LangChain agent) Model: Mistral Cloud Chat Model (mistral-medium-latest) Behavior: Parse OCR blocks and non-file chat input. Extract required fields: vendor_name, category, invoice_date, checkin_date, checkout_date, time, currency, total_amount, notes, estimated. When fields are missing, infer logically and mark estimated: true. Use Calculator tool to sum totals across multiple receipts. Fetch stored OCR entries from Supabase when user asks for session summaries. Always attempt extraction; never reply with "unclear" or ask for a reupload unless user requests audit-grade precision. Final output: Clean expense table and Grand Total formatted for chat. Data Flow Summary User sends chat message plus or minus file. IF file present -> Split Out -> Normalize -> OCR -> Store OCR output -> Merge with chat payload. Travel reimbursement agent consumes merged item, extracts fields, uses Calculator tool for sums, and replies with a formatted expense summary. Integrations Used | Service | Purpose | Credential | |---------|---------|-----------| | Mistral Cloud | LLM for agent | Mistral account | | Supabase | Store parsed OCR blocks and session data | Supabase account | | OCR API | Text extraction from images/PDFs | Configurable HTTP endpoint | | n8n Core | Flow control, parsing, editing | Native | Agent System Prompt Summary > You are a Travel Expense Extraction and Calculation AI. Extract vendor, dates, currency, category, and total amounts from uploaded receipts, invoices, hotel bills, PDFs, and images. Infer values when necessary and mark them as estimated. When asked, fetch session entries from Supabase and compute totals using the Calculator tool. Respond in a concise business professional format with a category wise breakdown and a Grand Total. Never reply "unclear" or ask for a reupload unless explicitly asked. Required final response format example: Key Features Zero retention friendly design: OCR output stored only to temp_table per session. Robust extraction with inference when OCR quality is imperfect. Session aware: agent retrieves stored receipts for consolidated totals. Calculator integration for accurate numeric sums and currency handling. Configurable OCR endpoint so you can swap providers without changing logic. Setup Checklist Add Mistral Cloud and Supabase credentials. Configure OCR endpoint to accept multipart uploads and return blocks. Create temp_table schema with session_id, file, file_name. Test with single receipts, multipage PDFs, and mixed uploads. Validate agent responses and Calculator totals. Summary A practical n8n workflow for travel expense automation: accept receipts, run OCR, store parsed data per session, extract structured fields via an AI agent, compute totals, and return clean expense summaries in chat. Built for reliability and easy integration. Need Help or More Workflows? We can integrate this into your environment, tune the agent prompt, or adapt it for different OCR providers. We can help you set it up for free — from connecting credentials to deploying it live. Contact: shilpa.raju@digitalbiz.tech Website: https://www.digitalbiz.tech LinkedIn: https://www.linkedin.com/company/digital-biz-tech/ You can also DM us on LinkedIn for any help.
by Trung Tran
Try It Out, HireMind – AI-Driven Resume Intelligence Pipeline! This n8n template demonstrates how to automate resume screening and evaluation using AI to improve candidate processing and reduce manual HR effort. A smart and reliable resume screening pipeline for modern HR teams. This workflow combines Google Drive (JD & CV storage), OpenAI (GPT-4-based evaluation), Google Sheets (position mapping + result log), and Slack/SendGrid integrations for real-time communication. Automatically extract, evaluate, and track candidate applications with clarity and consistency. How it works A candidate submits their application using a form that includes name, email, CV (PDF), and a selected job role. The CV is uploaded to Google Drive for record-keeping and later reference. The Profile Analyzer Agent reads the uploaded resume, extracts structured candidate information, and transforms it into a standardized JSON format using GPT-4 and a custom output parser. The corresponding job description PDF file is automatically retrieved from a Google Sheet based on the selected job role. The HR Expert Agent evaluates the candidate profile against the job description using another GPT-4 model, generating a structured assessment that includes strengths, gaps, and an overall recommendation. The evaluation result is parsed and formatted for output. The evaluation score will be used to mark candidate as qualified or unqualified, based on that an email will be sent to applicant or the message will be send to hiring team for the next process The final evaluation result will be stored in a Google Sheet for long-term tracking and reporting. Google drive structure ├── jd # Google drive folder to store your JD (pdf) │ ├── Backend_Engineer.pdf │ ├── Azure_DevOps_Lead.pdf │ └── ... │ ├── cv # Google drive folder, where workflow upload candidate resume │ ├── John_Doe_DevOps.pdf │ ├── Jane_Smith_FullStack.pdf │ └── ... │ ├── Positions (Sample: https://docs.google.com/spreadsheets/d/1pW0muHp1NXwh2GiRvGVwGGRYCkcMR7z8NyS9wvSPYjs/edit?usp=sharing) # 📋 Mapping Table: Job Role ↔ Job Description (Link) │ └── Columns: │ - Job Role │ - Job Description File URL (PDF in jd/) │ └── Evaluation form (Google Sheet) # ✅ Final AI Evaluation Results How to use Set up credentials and integrations: Connect your OpenAI account (GPT-4 API). Enable Google Cloud APIs: Google Sheets API (for reading job roles and saving evaluation results) Google Drive API (for storing CVs and job descriptions) Set up SendGrid (to send email responses to candidates) Connect Slack (to send messages to the hiring team) Prepare your Google Drive structure: Create a root folder, then inside it create: /jd → Store all job descriptions in PDF format /cv → This is where candidate CVs will be uploaded automatically Create a Google Sheet named Positions with the following structure: | Job Role | Job Description Link | |------------------------------|----------------------------------------| | Azure DevOps Engineer | https://drive.google.com/xxx/jd1.pdf | | Full-Stack Developer (.NET) | https://drive.google.com/xxx/jd2.pdf | Update your application form: Use the built-in form, or connect your own (e.g., Typeform, Tally, Webflow, etc.) Ensure the Job Role dropdown matches exactly the roles in the Positions sheet Run the AI workflow: When a candidate submits the form: Their CV is uploaded to the /cv folder The job role is used to match the JD from /jd The Profile Analyzer Agent extracts candidate info from the CV The HR Expert Agent evaluates the candidate against the matched JD using GPT-4 Distribute and store results: Store the evaluation results in the Evaluation form Google Sheet Optionally notify your team: ✉️ Send an email to the candidate using SendGrid 💬 Send a Slack message to the hiring team with a summary and next steps Requirements OpenAI GPT-4 account for both Profile Analyzer and HR Expert Agents Google Drive account (for storing CVs and evaluation sheet) Google Sheets API credentials (for JD source and evaluation results) Need Help? Join the n8n Discord or ask in the n8n Forum! Happy Hiring! 🚀
by Pinecone
Try it out This n8n workflow template lets you chat with your Google Drive documents (.docx, .json, .md, .txt, .pdf) using OpenAI and Pinecone Assistant. It retrieves relevant context from your files in real time so you can get accurate, context-aware answers about your proprietary data—without the need to train your own LLM. What is Pinecone Assistant? Pinecone Assistant allows you to build production-grade chat and agent-based applications quickly. It abstracts the complexities of implementing retrieval-augmented (RAG) systems by managing the chunking, embedding, storage, query planning, vector search, model orchestration, reranking for you. Prerequisites A Pinecone account and API key A GCP project with Google Drive API enabled and configured Note: When setting up the OAuth consent screen, skip steps 8-10 if running on localhost An Open AI account and API key Setup Create a Pinecone Assistant in the Pinecone Console here Name your Assistant n8n-assistant and create it in the United States region If you use a different name or region, update the related nodes to reflect these changes No need to configure a Chat model or Assistant instructions Setup your Google Drive OAuth2 API credential in n8n In the File added node -> Credential to connect with, select Create new credential Set the Client ID and Client Secret from the values generated in the prerequisites Set the OAuth Redirect URL from the n8n credential in the Google Cloud Console (instructions) Name this credential Google Drive account so that other nodes reference it Setup Pinecone API key credential in n8n In the Upload file to assistant node -> PineconeApi section, select Create new credential Paste in your Pinecone API key in the API Key field Setup Pinecone MCP Bearer auth credential in n8n In the Pinecone Assistant node -> Credential for Bearer Auth section, select Create new credential Set the Bearer Token field to your Pinecone API key used in the previous step Setup the Open AI credential in n8n In the OpenAI Chat Model node -> Credential to connect with, select Create new credential Set the API Key field to your OpenAI API key Add your files to a Drive folder named n8n-pinecone-demo in the root of your My Drive If you use a different folder name, you'll need to update the Google Drive triggers to reflect that change Activate the workflow or test it with a manual execution to ingest the documents Chat with your docs! Ideas for customizing this workflow Customize the System Message on the AI Agent node to your use case to indicate what kind of knowledge is stored in Pinecone Assistant Change the top_k value of results returned from Assistant by adding "and should set a top_k of 3" to the System Message to help manage token consumption Configure the Context Window Length in the Conversation Memory node Swap out the Conversation Memory node for one that is more persistent Make the chat node publicly available or create your own chat interface that calls the chat webhook URL. Need help? You can find help by asking in the Pinecone Discord community, asking on the Pinecone Forum, or filing an issue on this repo.
by Akshay
Overview This project is an AI-powered WhatsApp virtual receptionist built using n8n, designed to handle both text and voice-based customer messages automatically. The workflow integrates Google Gemini, Pinecone, and the WhatsApp Business API to provide intelligent, context-aware responses that feel natural and professional. How It Works Message Detection The workflow begins when a message arrives on WhatsApp. It identifies whether the message is text or voice and routes it accordingly. Voice Message Handling Audio messages are securely downloaded from WhatsApp. The files are converted to Base64 format and sent to the Gemini API for transcription. The transcribed text is then passed to the AI Agent for further processing. AI Agent Processing The LangChain AI Agent acts as the brain of the system. It uses: Google Gemini Chat Model** for natural language understanding and response generation. Pinecone Vector Store** to retrieve company-specific information and product data. Memory Buffer** to remember the last 20 user messages, ensuring context-aware responses. The agent also follows a set of custom communication rules — replying only in approved languages, skipping greetings, and focusing on direct, helpful, and professional responses (e.g., product recommendations, support, or guidance). Knowledge Retrieval The AI Agent connects to a Pinecone database containing detailed company data, such as product catalogs or service FAQs. Using Gemini-generated embeddings, it retrieves the most relevant information for each user query. Response Delivery Once the AI Agent prepares the response, it is instantly sent back to the user via WhatsApp, completing the conversational loop. Who It’s For This system is ideal for businesses seeking to automate their customer communication through WhatsApp. It’s especially valuable for: Product-based companies** with frequent customer inquiries. Service providers** offering 24/7 customer assistance or quote requests. SMBs** looking to scale their communication without hiring additional staff. Tech Stack & Requirements n8n** – Workflow automation and orchestration. WhatsApp Cloud API** – For sending and receiving messages. Google Gemini (PaLM)** – For LLM-based transcription and response generation. Pinecone** – Vector database for product and service knowledge retrieval. LangChain Integration** – For connecting memory, vector store, and reasoning tools. Custom Business Rules** – Configurable within the AI Agent node to manage tone, style, and workflow behavior. Key Features Handles both text and voice messages seamlessly. Responds in multiple languages, including English. Maintains conversation memory per user session. Retrieves accurate company-specific information using vector search. Fully automated, with customizable behavior for different industries or use cases. Setup Instructions 1. Prerequisites Before importing the workflow, ensure you have: An active n8n instance (self-hosted or n8n Cloud). WhatsApp Cloud API credentials** from Meta. Google Gemini API key** with model access (for chat and transcription). Pinecone API key** with a preconfigured vector index containing your company data. 2. Environment Setup Install all required credentials under Settings → Credentials in n8n. Add environment variables (if applicable) for keys like: GOOGLE_API_KEY=your_google_gemini_key PINECONE_API_KEY=your_pinecone_key WHATSAPP_ACCESS_TOKEN=your_whatsapp_token 3. Pinecone Configuration Create a Pinecone index named, for example, products-index. Upload company documents or product details as vector embeddings using Gemini or LangChain utilities. Adjust the retrieval limit in the Pinecone node settings for broader or narrower search responses. 4. WhatsApp API Configuration Set up a WhatsApp Business Account via Meta Developer Dashboard. Create a webhook endpoint URL (n8n’s public URL) to receive WhatsApp messages. Use the WhatsApp Trigger Node to capture messages in real time. 5. AI Agent Customization You can personalize how the AI behaves by editing the system prompt inside the AI Agent node: Modify tone, response length, or product focus. Add new “rules” for language preferences or conversation flow. Include links or custom text output (e.g., quotation formats, product catalog messages). 6. Handling Voice Messages Ensure your WhatsApp Business Account has media message permissions enabled. Verify the HTTP Request node that connects to the Gemini API for transcription is correctly authenticated. You can adjust the transcription model or prompt if you prefer shorter, keyword-based outputs. 7. Testing Send both text and voice messages from a test WhatsApp number. Check response time and message formatting. Use n8n’s execution logs to debug errors (especially for media downloads or API credentials). Customization Options 🧩 AI Behavior Modify the AI Agent’s system message to adapt tone and personality (e.g., sales-oriented, support-driven). Update memory length (default: last 20 messages) for longer or shorter conversations. 🌍 Multi-language Support Add or remove allowed languages in the rules section of the AI Agent node. For multilingual businesses, duplicate the AI Agent path and route messages by language detection. 📦 Industry Adaptation Swap the Pinecone dataset to suit different industries — retail, hospitality, logistics, etc. Replace product data with FAQs, customer records, or support documentation.
by Don Jayamaha Jr
A fully autonomous, HTX Spot Market AI Agent (Huobi AI Agent) built using GPT-4o and Telegram. This workflow is the primary interface, orchestrating all internal reasoning, trading logic, and output formatting. ⚙️ Core Features 🧠 LLM-Powered Intelligence: Built on GPT-4o with advanced reasoning ⏱️ Multi-Timeframe Support: 15m, 1h, 4h, and 1d indicator logic 🧩 Self-Contained Multi-Agent Workflow: No external subflows required 🧮 Real-Time HTX Market Data: Live spot price, volume, 24h stats, and order book 📲 Telegram Bot Integration: Interact via chat or schedule 🔄 Autonomous Runs: Support for webhook, schedule, or Telegram triggers 📥 Input Examples | User Input | Agent Action | | --------------- | --------------------------------------------- | | btc | Returns 15m + 1h analysis for BTC | | eth 4h | Returns 4-hour swing data for ETH | | bnbusdt today | Full day snapshot with technicals + 24h stats | 🖥️ Telegram Output Sample 📊 BTC/USDT Market Summary 💰 Price: $62,400 📉 24h Stats: High $63,020 | Low $60,780 | Volume: 89,000 BTC 📈 1h Indicators: • RSI: 68.1 → Overbought • MACD: Bearish crossover • BB: Tight squeeze forming • ADX: 26.5 → Strengthening trend 📉 Support: $60,200 📈 Resistance: $63,800 🛠️ Setup Instructions Create your Telegram Bot using @BotFather Add Bot Token in n8n Telegram credentials Add your GPT-4o or OpenAI-compatible key under HTTP credentials in n8n (Optional) Add your HTX API credentials if expanding to authenticated endpoints Deploy this main workflow using: ✅ Webhook (HTTP Request Trigger) ✅ Telegram messages ✅ Cron / Scheduled automation 🎥 Live Demo 🧠 Internal Architecture | Component | Role | | ------------------ | -------------------------------------------------------- | | 🔄 Telegram Trigger | Entry point for external or manual signal | | 🧠 GPT-4o | Symbol + timeframe extraction + strategy generation | | 📊 Data Collector | Internal tools fetch price, indicators, order book, etc. | | 🧮 Reasoning Layer | Merges everything into a trading signal summary | | 💬 Telegram Output | Sends formatted HTML report via Telegram | 📌 Use Case Examples | Scenario | Outcome | | -------------------------------------- | ------------------------------------------------------- | | Auto-run every 4 hours | Sends new HTX signal summary to Telegram | | Human requests “eth 1h” | Bot replies with real-time 1h chart-based summary | | System-wide trigger from another agent | Invokes webhook and returns response to parent workflow | 🧾 Licensing & Attribution © 2025 Treasurium Capital Limited Company Architecture, prompts, and trade report structure are IP-protected. No unauthorized rebranding permitted. 🔗 For support: Don Jayamaha – LinkedIn
by Zain Khan
Categories: Business Automation, Customer Support, AI, Knowledge Management This comprehensive workflow enables businesses to build and deploy a custom-trained AI Chatbot in minutes. By combining a sophisticated data scraping engine with a RAG-based (Retrieval-Augmented Generation) chat interface, it allows you to transform website content into a high-performance support agent. Powered by Google Gemini and Pinecone, this system ensures your chatbot provides accurate, real-time answers based exclusively on your business data. Benefits Instant Knowledge Sync** - Automatically crawls sitemaps and URLs to keep your AI up-to-date with your latest website content. Embeddable Anywhere** - Features a ready-to-use chat trigger that can be integrated into the bottom-right of any website via a simple script. High-Fidelity Retrieval** - Uses vector embeddings to ensure the AI "searches" your documentation before answering, reducing hallucinations. Smart Conversational Memory** - Equipped with a 10-message window buffer, allowing the bot to handle complex follow-up questions naturally. Cost-Efficient Scaling** - Leverages Gemini’s efficient API and Pinecone’s high-speed indexing to manage thousands of customer queries at a low cost. How It Works Dual-Path Ingestion: The process begins with an n8n Form where you provide a sitemap or individual URLs. The workflow automatically handles XML parsing and URL cleaning to prepare a list of pages for processing. Clean Content Extraction: Using Decodo, the workflow fetches the HTML of each page and uses a specialized extraction node to strip away code, ads, and navigation, leaving only the high-value text content. SignUp using: dashboard.decodo.com/register?referral_code=55543bbdb96ffd8cf45c2605147641ee017e7900. Vectorization & Storage: The cleaned text is passed to the Gemini Embedding model, which converts the information into 3076-dimensional vectors. These are stored in a Pinecone "supportbot" index for instant retrieval. RAG-Powered Chat Agent: When a user sends a message through the chat widget, an AI Agent takes over. It uses the user's query to search the Pinecone database for relevant business facts. Intelligent Response Generation: The AI Agent passes the retrieved facts and the current chat history to Google Gemini, which generates a polite, accurate, and contextually relevant response for the user. Requirements n8n Instance:** A self-hosted or cloud instance of n8n. Google Gemini API Key:** For text embeddings and chat generation. Pinecone Account:** An API key and a "supportbot" index to store your knowledge base. Decodo Access:** For high-quality website content extraction. How to Use Initialize the Knowledge Base: Use the Form Trigger to input your website URL or Sitemap. Run the ingestion flow to populate your Pinecone index. Configure Credentials: Authenticate your Google Gemini and Pinecone accounts within n8n. Deploy the Chatbot: Enable the Chat Trigger node. Use the provided webhook URL to connect the backend to your website's frontend chat widget. Test & Refine: Interact with the bot to ensure it retrieves the correct data, and update your knowledge base by re-running the ingestion flow whenever your website content changes. Business Use Cases Customer Support Teams** - Automate answers to 80% of common FAQs using your existing documentation. E-commerce Sites** - Help customers find product details, shipping policies, and return information instantly. SaaS Providers** - Build an interactive technical documentation assistant to help users navigate your software. Marketing Agencies** - Offer "AI-powered site search" as an add-on service for client websites. Efficiency Gains Reduce Ticket Volume** by providing instant self-service options. Eliminate Manual Data Entry** by scraping content directly from the live website. Improve UX** with 24/7 availability and zero wait times for customers. Difficulty Level: Intermediate Estimated Setup Time: 30 min Monthly Operating Cost: Low (variable based on AI usage and Pinecone tier)
by vinci-king-01
Software Vulnerability Tracker with Pushover and Notion ⚠️ COMMUNITY TEMPLATE DISCLAIMER: This is a community-contributed template that uses ScrapeGraphAI (a community node). Please ensure you have the ScrapeGraphAI community node installed in your n8n instance before using this template. This workflow automatically scans multiple patent databases on a weekly schedule, filters new filings relevant to selected technology domains, saves the findings to Notion, and pushes instant alerts to your mobile device via Pushover. It is ideal for R&D teams and patent attorneys who need up-to-date insights on emerging technology trends and competitor activity. Pre-conditions/Requirements Prerequisites An n8n instance (self-hosted or n8n cloud) ScrapeGraphAI community node installed Active Notion account with an integration created Pushover account (user key & application token) List of technology keywords / CPC codes to monitor Required Credentials ScrapeGraphAI API Key** – Enables web scraping of patent portals Notion Credential** – Internal Integration Token with database write access Pushover Credential** – App Token + User Key for push notifications Additional Setup Requirements | Service | Needed Item | Where to obtain | |---------|-------------|-----------------| | USPTO, EPO, WIPO, etc. | Public URLs for search endpoints | Free/public | | Notion | Database with properties: Title, Abstract, URL, Date | Create in Notion | | Keyword List | Text file or environment variable PATENT_KEYWORDS | Define yourself | How it works This workflow automatically scans multiple patent databases on a weekly schedule, filters new filings relevant to selected technology domains, saves the findings to Notion, and pushes instant alerts to your mobile device via Pushover. It is ideal for R&D teams and patent attorneys who need up-to-date insights on emerging technology trends and competitor activity. Key Steps: Schedule Trigger**: Fires every week (default Monday 08:00 UTC). Code (Prepare Queries)**: Builds search URLs for each keyword and data source. SplitInBatches**: Processes one query at a time to respect rate limits. ScrapeGraphAI**: Scrapes patent titles, abstracts, links, and publication dates. Code (Normalize & Deduplicate)**: Cleans data, converts dates, and removes already-logged patents. IF Node**: Checks whether new patents were found. Notion Node**: Inserts new patent entries into the specified database. Pushover Node**: Sends a concise alert summarizing the new filings. Sticky Notes**: Document configuration tips inside the workflow. Set up steps Setup Time: 10-15 minutes Install ScrapeGraphAI: In n8n, go to “Settings → Community Nodes” and install @n8n-nodes/scrapegraphai. Add Credentials: ScrapeGraphAI: paste your API key. Notion: add the internal integration token and select your database. Pushover: provide your App Token and User Key. Configure Keywords: Open the first Code node and edit the keywords array (e.g., ["quantum computing", "Li-ion battery", "5G antenna"]). Point to Data Sources: In the same Code node, adjust the sources array if you want to add/remove patent portals. Set Notion Database Mapping: In the Notion node, map properties (Name, Abstract, Link, Date) to incoming JSON fields. Adjust Schedule (optional): Double-click the Schedule Trigger and change the CRON expression to your preferred interval. Test Run: Execute the workflow manually. Confirm that the Notion page is populated and a Pushover notification arrives. Activate: Switch the workflow to “Active” to enable automatic weekly execution. Node Descriptions Core Workflow Nodes: Schedule Trigger** – Defines the weekly execution time. Code (Build Search URLs)** – Dynamically constructs patent search URLs. SplitInBatches** – Sequentially feeds each query to the scraper. ScrapeGraphAI** – Extracts patent metadata from HTML pages. Code (Normalize Data)** – Formats dates, adds UUIDs, and checks for duplicates. IF** – Determines whether new patents exist before proceeding. Notion** – Writes new patent records to your Notion database. Pushover** – Sends real-time mobile/desktop notifications. Data Flow: Schedule Trigger → Code (Build Search URLs) → SplitInBatches → ScrapeGraphAI → Code (Normalize Data) → IF → Notion & Pushover Customization Examples Change Notification Message // Inside the Pushover node "Message" field return { message: 📜 ${items[0].json.count} new patent(s) detected in ${new Date().toDateString()}, title: '🆕 Patent Alert', url: items[0].json.firstPatentUrl, url_title: 'Open first patent' }; Add Slack Notification Instead of Pushover // Replace the Pushover node with a Slack node { text: ${$json.count} new patents published:\n${$json.list.join('\n')}, channel: '#patent-updates' } Data Output Format The workflow outputs structured JSON data: { "title": "Quantum Computing Device", "abstract": "A novel qubit architecture that ...", "url": "https://patents.example.com/US20240012345A1", "publicationDate": "2024-06-01", "source": "USPTO", "keywordsMatched": ["quantum computing"] } Troubleshooting Common Issues No data returned – Verify that search URLs are still valid and the ScrapeGraphAI selector matches the current page structure. Duplicate entries in Notion – Ensure the “Normalize Data” code correctly checks for existing URLs or IDs before insert. Performance Tips Limit the number of keywords or schedule the workflow during off-peak hours to reduce API throttling. Enable caching inside ScrapeGraphAI (if available) to minimize repeated requests. Pro Tips: Use environment variables (e.g., {{ $env.PATENT_KEYWORDS }}) to manage keyword lists without editing nodes. Chain an additional “HTTP Request → ML Model” step to auto-classify patents by CPC codes. Create a Notion view filtered by publicationDate is within past 30 days for quick scanning.
by Bhuvanesh R
Your Cold Email is Now Researched. This pipeline finds specific bottlenecks on prospect websites and instantly crafts an irresistible pitch 🎯 Problem Statement Traditional high-volume cold email outreach is stuck on generic personalization (e.g., "Love your website!"). Sales teams, especially those selling high-value AI Receptionists, struggle to efficiently find the one Unique Operational Hook (like manual scheduling dependency or high call volume) needed to make the pitch relevant. This forces reliance on expensive, slow manual research, leading to low reply rates and inefficient spending on bulk outreach tools. ✨ Solution This workflow deploys a resilient Dual-AI Personalization Pipeline that runs on a batch basis. It uses the Filter (Qualified Leads) node as a cost-saving Quality Gate to prevent processing bad leads. It executes a Targeted Deep Dive on successful leads, using GPT-4 for analytical insight extraction and Claude Sonnet for coherent, human-like copy generation. The entire process outputs campaign-ready data directly to Google Sheets and sends a critical QA Draft via Gmail. ⚙️ How It Works (Multi-Step Execution) 1\. Ingestion and Cost Control (The Quality Gate) Trigger and Ingestion:* The workflow starts via a *Manual Trigger, pulling leads directly from **Get All Leads (Google Sheets). Cost Filtering:* The *Filter (Qualified Leads)** node removes leads that lack a working email or website URL. Execution Isolation:* The *Loop Over Leads* node initiates individual processing. The *Capture Lead Data (Set)** node immediately captures and locks down the original lead context for stability throughout the loop. Hybrid Scraping:* The *Scrape Site (HTTP Request)* and *Extract Text & Links (HTML)* nodes execute the *Hybrid Scraping* strategy, simultaneously capturing *website text* and *external links**. Data Shaping & Status:* The *Filter Social & Status (Code)* node is the control center. It filters links, bundles the context, and critically, assigns a *status** of 'Success' or 'Scrape Fail'. Cost Control Branch:* The *If (IF node)* checks this status. Items with 'Scrape Fail' bypass all AI steps (saving *100% of AI token costs) and jump directly to **Log Final Result. Successful items proceed to the AI core. 2\. Dual-AI Coherence & Dispatch (The Executive Output) Analytical Synthesis:* The *Summarize Website (OpenAI)* node uses *GPT-4* to synthesize the full context and extract the *Unique Operational Hook** (e.g., manual booking overhead). Coherent Copy Generation:* The *Generate Subject & Body (Anthropic)* node uses the *Claude Sonnet* model to generate the subject and the multi-line body, guaranteeing *coherence** by creating both simultaneously in a single JSON output. Final Parsing:* The *Parse AI Output (Code)* node reliably strips markdown wrappers and extracts the clean *subject* and *body** strings. Final Delivery:* The data is logged via *Log Final Result (Google Sheets), and the completed email is sent to the user via **Create a draft (Gmail) for final Quality Assurance before sending. 🛠️ Setup Steps Before running the workflow, ensure these credentials and data structures are correctly configured: Credentials Anthropic:** Configure credentials for the Language Model (Claude Sonnet). OpenAI:** Configure credentials for the Analytical Model (GPT-4/GPT-4o). Google Services:* Set up OAuth2 credentials for *Google Sheets* (Input/Output) and *Gmail** (Draft QA and Completion Alert). Configuration Google Sheet Setup:* Your input sheet must include the columns *email, **website\_url, and an empty Icebreaker column for initial filtering. HTTP URL:* Verify that the *Scrape Site** node's URL parameter is set to pull the website URL from the stabilized data structure: ={{ $json.website\_url }}. AI Prompts:** Ensure the Anthropic prompt contains your current Irresistible Sales Offer and the required nested JSON output structure. ✅ Benefits Coherence Guarantee:* A single *Anthropic** node generates both the subject and body, guaranteeing the message is perfectly aligned and hits the same unique insight. Maximum Cost Control:* The *IF node* prevents spending tokens on bad or broken websites, making the campaign highly *budget-efficient**. Deep Personalization:* Combines *website text* and *social media links**, creating an icebreaker that implies thorough, manual research. High Reliability:* Uses robust *Code nodes** for data structuring and parsing, ensuring the workflow runs consistently under real-world conditions without crashing. Zero-Risk QA:* The final *Gmail (Create a draft)** step ensures human review of the generated copy before any cold emails are sent out.
by phil
This workflow is designed for B2B professionals to automatically identify and summarize business opportunities from a company's website. By leveraging Bright Data's Web Unblocker and advanced AI models from OpenRouter, it scrapes relevant company pages ("About Us", "Team", "Contact"), analyzes the content for potential pain points and needs, and synthesizes a concise, actionable report. The final output is formatted for direct use in documents, making it an ideal tool for sales, marketing, and business development teams to prepare for prospecting calls or personalize outreach. Who's it for This template is ideal for: B2B Sales Teams:** Quickly find and qualify leads by identifying specific business needs before a cold call. Marketing Agencies:** Develop personalized content and value propositions based on a prospect's public website information. Business Development Professionals:** Efficiently research potential partners or clients and discover collaboration opportunities. Entrepreneurs:** Gain a competitive edge by understanding a competitor's strategy or a potential client's operations. How it works The workflow is triggered by a chat message, typically a URL from an n8n chat application. It uses Bright Data to scrape the website's sitemap and extract all anchor links from the homepage. An AI agent analyzes the extracted URLs to filter for pages relevant to company information (e.g., "about-us," "team," "contact"). The workflow then scrapes the content of these specific pages. A second AI agent summarizes the content of each page, looking for business opportunities related to AI-powered automation. The summaries are merged and a final AI agent synthesizes them into a single, cohesive report, formatted for easy reading in a Google Doc. How to set up Bright Data Credentials: Sign up for a Bright Data account and create a Web Unblocker zone. In n8n, create new Bright Data API credentials and copy your API key. OpenRouter Credentials: Create an account on OpenRouter and get your API key. In n8n, create new OpenRouter API credentials and paste your key. Chat Trigger Node: Configure the "When chat message received" node. Copy the production webhook URL to integrate with your preferred chat platform. Requirements An active n8n instance. A Bright Data account with a Web Unblocker zone. An OpenRouter account with API access. How to customize this workflow AI Prompting:** Edit the "systemMessage" parameters in the "AI Agent", "AI Agent1", and "AI Agent2" nodes to change the focus of the opportunity analysis. For example, modify the prompts to search for specific technologies, industry jargon, or different types of business challenges. Model Selection:** The workflow uses openai/o4-mini and openai/gpt-5. You can change these to other models available on OpenRouter by editing the model parameter in the OpenRouter Chat Model nodes. Scraping Logic:** The extract url node uses a regular expression to find `` tags. This can be modified or replaced with an HTML Extraction node to target different elements or content on a website. Output Format:** The final output is designed for Google Docs. You can modify the last "AI Agent2" node's prompt to generate the output in a different format, such as a simple JSON object or a markdown list. Phil | Inforeole 🇫🇷 Contactez nous pour automatiser vos processus
by Mariela Slavenova
This template crawls a website from its sitemap, deduplicates URLs in Supabase, scrapes pages with Crawl4AI, cleans and validates the text, then stores content + metadata in a Supabase vector store using OpenAI embeddings. It’s a reliable, repeatable pipeline for building searchable knowledge bases, SEO research corpora, and RAG datasets. ⸻ Good to know • Built-in de-duplication via a scrape_queue table (status: pending/completed/error). • Resilient flow: waits, retries, and marks failed tasks. • Costs depend on Crawl4AI usage and OpenAI embeddings. • Replace any placeholders (API keys, tokens, URLs) before running. • Respect website robots/ToS and applicable data laws when scraping. How it works Sitemap fetch & parse — Load sitemap.xml, extract all URLs. De-dupe — Normalize URLs, check Supabase scrape_queue; insert only new ones. Scrape — Send URLs to Crawl4AI; poll task status until completed. Clean & score — Remove boilerplate/markup, detect content type, compute quality metrics, extract metadata (title, domain, language, length). Chunk & embed — Split text, create OpenAI embeddings. Store — Upsert into Supabase vector store (documents) with metadata; update job status. Requirements • Supabase (Postgres + Vector extension enabled) • Crawl4AI API key (or header auth) • OpenAI API key (for embeddings) • n8n credentials set for HTTP, Postgres/Supabase How to use Configure credentials (Supabase/Postgres, Crawl4AI, OpenAI). (Optional) Run the provided SQL to create scrape_queue and documents. Set your sitemap URL in the HTTP Request node. Execute the workflow (manual trigger) and monitor Supabase statuses. Query your documents table or vector store from your app/RAG stack. Potential Use Cases This automation is ideal for: Market research teams collecting competitive data Content creators monitoring web trends SEO specialists tracking website content updates Analysts gathering structured data for insights Anyone needing reliable, structured web content for analysis Need help customizing? Contact me for consulting and support: LinkedIn
by Growth AI
SEO Content Generation Workflow - n8n Template Instructions Who's it for This workflow is designed for SEO professionals, content marketers, digital agencies, and businesses who need to generate optimized meta tags, H1 headings, and content briefs at scale. Perfect for teams managing multiple clients or large keyword lists who want to automate competitor analysis and SEO content creation while maintaining quality and personalization. How it works The workflow automates the entire SEO content creation process by analyzing your target keywords against top competitors, then generating optimized meta elements and comprehensive content briefs. It uses AI-powered analysis combined with real competitor data to create SEO-friendly content that's tailored to your specific business context. The system processes keywords in batches, performs Google searches, scrapes competitor content, analyzes heading structures, and generates personalized SEO content using your company's database information for maximum relevance. Requirements Required Services and Credentials Google Sheets API**: For reading configuration and updating results Anthropic API**: For AI content generation (Claude Sonnet 4) OpenAI API**: For embeddings and vector search Apify API**: For Google search results Firecrawl API**: For competitor website scraping Supabase**: For vector database (optional but recommended) Template Spreadsheet Copy this template spreadsheet and configure it with your information: Template Link How to set up Step 1: Copy and Configure Template Make a copy of the template spreadsheet Fill in the Client Information sheet: Client name: Your company or client's name Client information: Brief business description URL: Website address Supabase database: Database name (prevents AI hallucination) Tone of voice: Content style preferences Restrictive instructions: Topics or approaches to avoid Complete the SEO sheet with your target pages: Page: Page you're optimizing (e.g., "Homepage", "Product Page") Keyword: Main search term to target Awareness level: User familiarity with your business Page type: Category (homepage, blog, product page, etc.) Step 2: Import Workflow Import the n8n workflow JSON file Configure all required API credentials in n8n: Google Sheets OAuth2 Anthropic API key OpenAI API key Apify API key Firecrawl API key Supabase credentials (if using vector database) Step 3: Test Configuration Activate the workflow Send your Google Sheets URL to the chat trigger Verify that all sheets are readable and credentials work Test with a single keyword row first Workflow Process Overview Phase 0: Setup and Configuration Copy template spreadsheet Configure client information and SEO parameters Set up API credentials in n8n Phase 1: Data Input and Processing Chat trigger receives Google Sheets URL System reads client configuration and SEO data Filters valid keywords and empty H1 fields Initiates batch processing Phase 2: Competitor Research and Analysis Searches Google for top 10 results per keyword Scrapes first 5 competitor websites Extracts heading structures (H1-H6) Analyzes competitor meta tags and content organization Phase 3: Meta Tags and H1 Generation AI analyzes keyword context and competitor data Accesses client database for personalization Generates optimized meta title (65 chars max) Creates compelling meta description (165 chars max) Produces user-focused H1 (70 chars max) Phase 4: Content Brief Creation Analyzes search intent percentages Develops content strategy based on competitor analysis Creates detailed MECE page structure Suggests rich media elements Provides writing recommendations and detail level scoring Phase 5: Data Integration and Updates Combines all generated content into unified structure Updates Google Sheets with new SEO elements Preserves existing data while adding new content Continues batch processing for remaining keywords How to customize the workflow Adjusting AI Models Replace Anthropic Claude with other LLM providers Modify system prompts for different content styles Adjust character limits for meta elements Modifying Competitor Analysis Change number of competitors analyzed (currently 5) Adjust scraping parameters in Firecrawl nodes Modify heading extraction logic in JavaScript nodes Customizing Output Format Update Google Sheets column mapping in Code node Modify structured output parser schema Change batch processing size in Split in Batches node Adding Quality Controls Insert validation nodes between phases Add error handling and retry logic Implement content quality scoring Extending Functionality Add keyword research capabilities Include image optimization suggestions Integrate social media content generation Connect to CMS platforms for direct publishing Best Practices Test with small batches before processing large keyword lists Monitor API usage and costs across all services Regularly update system prompts based on output quality Maintain clean data in your Google Sheets template Use descriptive node names for easier workflow maintenance Troubleshooting API Errors**: Check credential configuration and usage limits Scraping Failures**: Firecrawl nodes have error handling enabled Empty Results**: Verify keyword formatting and competitor availability Sheet Updates**: Ensure proper column mapping in final Code node Processing Stops**: Check batch processing limits and timeout settings
by Growth AI
SEO Content Generation Workflow (Basic Version) - n8n Template Instructions Who's it for This workflow is designed for SEO professionals, content marketers, digital agencies, and businesses who need to generate optimized meta tags, H1 headings, and content briefs at scale. Perfect for teams managing multiple clients or large keyword lists who want to automate competitor analysis and SEO content creation without the complexity of vector databases. How it works The workflow automates the entire SEO content creation process by analyzing your target keywords against top competitors, then generating optimized meta elements and comprehensive content briefs. It uses AI-powered analysis combined with real competitor data to create SEO-friendly content that's tailored to your specific business context. The system processes keywords in batches, performs Google searches, scrapes competitor content, analyzes heading structures, and generates personalized SEO content using your company information for maximum relevance. Requirements Required Services and Credentials Google Sheets API**: For reading configuration and updating results Anthropic API**: For AI content generation (Claude Sonnet 4) Apify API**: For Google search results Firecrawl API**: For competitor website scraping Template Spreadsheet Copy this template spreadsheet and configure it with your information: Template Link How to set up Step 1: Copy and Configure Template Make a copy of the template spreadsheet Fill in the Client Information sheet: Client name: Your company or client's name Client information: Brief business description URL: Website address Tone of voice: Content style preferences Restrictive instructions: Topics or approaches to avoid Complete the SEO sheet with your target pages: Page: Page you're optimizing (e.g., "Homepage", "Product Page") Keyword: Main search term to target Awareness level: User familiarity with your business Page type: Category (homepage, blog, product page, etc.) Step 2: Import Workflow Import the n8n workflow JSON file Configure all required API credentials in n8n: Google Sheets OAuth2 Anthropic API key Apify API key Firecrawl API key Step 3: Test Configuration Activate the workflow Send your Google Sheets URL to the chat trigger Verify that all sheets are readable and credentials work Test with a single keyword row first Workflow Process Overview Phase 0: Setup and Configuration Copy template spreadsheet Configure client information and SEO parameters Set up API credentials in n8n Phase 1: Data Input and Processing Chat trigger receives Google Sheets URL System reads client configuration and SEO data Filters valid keywords and empty H1 fields Initiates batch processing Phase 2: Competitor Research and Analysis Searches Google for top 10 results per keyword using Apify Scrapes first 5 competitor websites using Firecrawl Extracts heading structures (H1-H6) from competitor pages Analyzes competitor meta tags and content organization Processes markdown content to identify heading hierarchies Phase 3: Meta Tags and H1 Generation AI analyzes keyword context and competitor data using Claude Incorporates client information for personalization Generates optimized meta title (65 characters maximum) Creates compelling meta description (165 characters maximum) Produces user-focused H1 (70 characters maximum) Uses structured output parsing for consistent formatting Phase 4: Content Brief Creation Analyzes search intent percentages (informational, transactional, navigational) Develops content strategy based on competitor analysis Creates detailed MECE page structure with H2 and H3 sections Suggests rich media elements (images, videos, infographics, tables) Provides writing recommendations and detail level scoring (1-10 scale) Ensures SEO optimization while maintaining user relevance Phase 5: Data Integration and Updates Combines all generated content into unified structure Updates Google Sheets with new SEO elements Preserves existing data while adding new content Continues batch processing for remaining keywords Key Differences from Advanced Version This basic version focuses on core SEO functionality without additional complexity: No Vector Database**: Removes Supabase integration for simpler setup Streamlined Architecture**: Fewer dependencies and configuration steps Essential Features Only**: Core competitor analysis and content generation Faster Setup**: Reduced time to deployment Lower Costs**: Fewer API services required How to customize the workflow Adjusting AI Models Replace Anthropic Claude with other LLM providers in the agent nodes Modify system prompts for different content styles or languages Adjust character limits for meta elements in the structured output parser Modifying Competitor Analysis Change number of competitors analyzed (currently 5) by adding/removing Scrape nodes Adjust scraping parameters in Firecrawl nodes for different content types Modify heading extraction logic in JavaScript Code nodes Customizing Output Format Update Google Sheets column mapping in the final Code node Modify structured output parser schema for different data structures Change batch processing size in Split in Batches node Adding Quality Controls Insert validation nodes between workflow phases Add error handling and retry logic to critical nodes Implement content quality scoring mechanisms Extending Functionality Add keyword research capabilities with additional APIs Include image optimization suggestions Integrate social media content generation Connect to CMS platforms for direct publishing Best Practices Setup and Testing Always test with small batches before processing large keyword lists Monitor API usage and costs across all services Regularly update system prompts based on output quality Maintain clean data in your Google Sheets template Content Quality Review generated content before publishing Customize system prompts to match your brand voice Use descriptive node names for easier workflow maintenance Keep competitor analysis current by running regularly Performance Optimization Process keywords in small batches to avoid timeouts Set appropriate retry policies for external API calls Monitor workflow execution times and optimize bottlenecks Troubleshooting Common Issues and Solutions API Errors Check credential configuration in n8n settings Verify API usage limits and billing status Ensure proper authentication for each service Scraping Failures Firecrawl nodes have error handling enabled to continue on failures Some websites may block scraping - this is normal behavior Check if competitor URLs are accessible and valid Empty Results Verify keyword formatting in Google Sheets Ensure competitor websites contain the expected content structure Check if meta tags are properly formatted in system prompts Sheet Update Errors Ensure proper column mapping in final Code node Verify Google Sheets permissions and sharing settings Check that target sheet names match exactly Processing Stops Review batch processing limits and timeout settings Check for errors in individual nodes using execution logs Verify all required fields are populated in input data Template Structure Required Sheets Client Information: Business details and configuration SEO: Target keywords and page information Results Sheet: Where generated content will be written Expected Columns Keywords**: Target search terms Description**: Brief page description Type de page**: Page category Awareness level**: User familiarity level title, meta-desc, h1, brief**: Generated output columns This streamlined version provides all essential SEO content generation capabilities while being easier to set up and maintain than the advanced version with vector database integration.