by Harsh Agrawal
Automated SEO Intelligence Platform with DataForSEO and Claude Transform any company website into a detailed SEO audit report in minutes! This workflow combines real-time web scraping, comprehensive SEO data analysis, and advanced AI reasoning to deliver client-ready reports automatically. Perfect for digital agencies scaling their audit services, freelance SEO consultants automating research, or SaaS teams analyzing competitor strategies before sales calls. The Process Discovery Phase: Input a company name and website URL to kick things off. The system begins with website content extraction. Intelligence Gathering: A dedicated scraper sub-workflow extracts all website content and converts it to structured markdown. Strategic Analysis: LLMs process the scraped content to understand the business model, target market, and competitive positioning. They generate business research insights and product strategy recommendations tailored to that specific company. Once this analysis completes, DataForSEO API then pulls technical metrics, backlink profiles, keyword rankings, and site health indicators. Report Assembly: All findings flow into a master report generator that structures the data into sections covering technical SEO, content strategy, competitive landscape, and actionable next steps. Custom branded cover and closing pages are added. Delivery: The HTML report converts to PDF format and emails directly to your recipient - no manual intervention needed. Setup Steps Add API credentials: OpenRouter (for AI), DataForSEO (for scraping/SEO data), and PDFco (for PDF generation) Configure email sending through your preferred service (Gmail, SendGrid, etc.) Optional: Upload custom first/last page PDFs for white-label branding Test with your own website first to see the magic happen! Customize It Adjust analysis depth: Modify the AI prompts to focus on specific SEO aspects (local SEO, e-commerce, B2B SaaS, etc.) Change report style: Edit the HTML template in the Sample_Code node for different formatting Add integrations: Connect to your CRM to automatically trigger reports when leads enter your pipeline Scale it up: Process multiple URLs in batch by feeding a Google Sheet of prospects What You'll Need OpenRouter account (Claude Opus 4.1 recommended for best insights) DataForSEO subscription (handles both scraping and SEO metrics) PDFco account (converts your reports to professional PDFs) Email service credentials configured in n8n Need Help? Connect with me on LinkedIn if you have any doubt
by Ranjan Dailata
Who this is for This workflow is designed for: Automation engineers building AI-powered data pipelines Product managers & analysts needing structured insights from web pages Researchers & content teams extracting summaries from documentation or articles HR, compliance, and knowledge teams converting unstructured web content into structured records n8n self-hosted users leveraging advanced scraping and LLM enrichment It is ideal for anyone who wants to transform any public URL into structured data + clean summaries automatically. What problem this workflow solves Web content is often unstructured, verbose, and inconsistent, making it difficult to: Extract structured fields reliably Generate consistent summaries Reuse data across spreadsheets, dashboards, or databases Eliminate manual copy-paste and interpretation This workflow solves the problem of turning arbitrary web pages into machine-readable JSON and human-readable summaries, without custom scrapers or manual parsing logic. What this workflow does The workflow integrates Decodo, Google Gemini, and Google Sheets to perform automated extraction of structured data. Here’s how it works step-by-step: Input Setup The workflow begins when the user executes it manually or passes a valid URL. The input includes url. Profile Extraction with Decodo Accepts any valid URL as input Scrapes the page content using Decodo Uses Google Gemini to: Extract structured data in JSON format Generate a concise, factual summary Cleans and parses AI-generated JSON safely Merges structured data and summary output Stores the final result in Google Sheets for reporting or downstream automation JSON Parsing & Merging The Code Node cleans and parses the JSON output from the AI for reliable downstream use. The Merge Node combines both structured data and the AI-generated summary. Data Storage in Google Sheets The Google Sheets Node appends or updates the record, storing the structured JSON and summary into a connected spreadsheet. End Output A unified, machine-readable data in JSON + an executive-level summary suitable data analysis or downstream automation. Setup Instructions Prerequisites n8n account** with workflow editor access Decodo API credentials** - You need to register, login and obtain the Basic Authentication Token via Decodo Dashboard Google Gemini (PaLM) API access** Google Sheets OAuth credentials** Setup Steps Import the workflow into your n8n instance. Configure Credentials Add your Decodo API credentials in the Decodo node. Connect your Google Gemini (PaLM) credentials for both AI nodes. Authenticate your Google Sheets account. Edit Input Node In the Set the Input Fields node, replace the default URL with your desired profile or dynamic data source. Run the Workflow Trigger manually or via webhook integration for automation. Verify that structured profile data and summary are written to the linked Google Sheet. How to customize this workflow to your needs You can easily extend or adapt this workflow: Modify Structured Output Change the Gemini extraction prompt to match your own JSON schema Add required fields such as authors, dates, entities, or metadata Improve Summarization Adjust summary length or tone (technical, executive, simplified) Add multi-language summarization using Gemini Change Output Destination Replace Google Sheets with: Databases (Postgres, MySQL) Notion Slack / Email File storage (JSON, CSV) Add Validation or Filtering Insert IF nodes to: Reject incomplete data Detect errors or hallucinated output Trigger alerts for malformed JSON Scale the Workflow Replace manual trigger with: Webhook Scheduled trigger Batch URL processing Summary This workflow provides a powerful, generic solution for converting unstructured web pages into structured, AI-enriched datasets. By combining Decodo for scraping, Google Gemini for intelligence, and Google Sheets for persistence, it enables repeatable, scalable, and production-ready data extraction without custom scrapers or brittle parsing logic.
by Abdullah Alshiekh
What Problem Does It Solve? We’ve all been there: you want to check if a product is cheaper on Amazon or Jumia, but opening a dozen tabs is a pain. Building a bot to do this usually fails because big e-commerce sites love to block scrapers with CAPTCHAs. This workflow fixes that headache by: Taking a product name from a chat message. Using Decodo to handle the hard part—searching Google and scraping the product pages without getting blocked. Using AI to read the messy HTML and pull out just the price and product name. Sending a clean "Best Price" summary back to the user instantly. How to Configure It Telegram Setup Create a bot with BotFather and paste your token into the Telegram node. Make sure your webhook is set up so the bot actually "hears" the messages. Decodo This is the engine that makes the workflow reliable. You'll need to add your Decodo API key in the credentials. We used Decodo here specifically because it handles the proxies and browser fingerprinting for you—so your Amazon requests actually go through instead of failing. AI Setup Plug in your OpenAI API key (or swap the node for Claude/Gemini if you prefer). The system prompt is already set up to ignore ads and find the real price, but feel free to tweak the tone. How It Works Trigger: You text the bot a product name (e.g., "Sony XM5"). Search: The workflow asks Decodo to Google that specific term on sites like Amazon.eg. Scrape: It grabs the URLs and passes them back to Decodo to fetch the page content safely. Extract: The AI reads through the text, finds the lowest price, and ignores the clutter. Reply: The bot texts you back with the best deal found. Customization Ideas Go wider:** Edit the search query to check other stores like Noon or Carrefour. Track trends:** Connect a Google Sheet to log what people are searching for—great for market research. If you need any help Get In Touch
by Amirul Hakimi
🚀 Enrich CRM Leads with LinkedIn Company Data Using AI Who's it for Sales teams, marketers, and business development professionals who need to automatically enrich their CRM records with detailed company information from LinkedIn profiles. Perfect for anyone doing B2B outreach who wants to personalize their messaging at scale. What it does This workflow transforms bare-bones lead records into rich, personalized prospect profiles by: Automatically scraping LinkedIn company profiles Using AI (GPT-4) to extract key business intelligence Generating 15+ email-ready personalization variables Updating your CRM with structured, actionable data The workflow pulls company overviews, products/services, funding information, recent posts, and converts everything into natural-language variables that can be dropped directly into your outreach templates. How it works Trigger: Workflow starts when a new lead is added to Airtable (or on schedule) Fetch: Retrieves the lead record containing the LinkedIn company URL Scrape: Pulls the raw HTML from the company's LinkedIn profile Clean: Strips HTML tags and formats content for AI processing Analyze: GPT-4 extracts structured company intelligence (overview, products, market presence, recent posts) Transform: Converts analysis into 15+ email-ready variables with natural phrasing Update: Writes enriched data back to your CRM Setup Requirements Airtable account** (free tier works fine) OpenAI API key** (GPT-4o-mini recommended for cost-effectiveness) LinkedIn company URLs** stored in your CRM 5 minutes** for initial configuration How to set up Configure Airtable Connection Replace YOUR_AIRTABLE_BASE_ID with your base ID Replace YOUR_TABLE_ID with your leads table ID Ensure your table has a "LinkedIn Organization URL" field Add your Airtable API credentials Add OpenAI Credentials Click on both OpenAI nodes Add your OpenAI API key GPT-4o-mini is recommended (cost-effective and fast) Set Up Trigger Add a trigger node (Schedule, Webhook, or Airtable trigger) Configure to run when new leads are added or on a daily schedule Test the Workflow Add a test lead with a LinkedIn company URL Execute
by Omer Fayyaz
An intelligent web scraping workflow that automatically routes URLs to site-specific extraction logic, normalizes data across multiple sources, and filters content by freshness to build a unified article feed. What Makes This Different: Intelligent Source Routing** - Uses a Switch node to route URLs to specialized extractors based on source identifier, enabling custom CSS selectors per publisher for maximum accuracy Universal Fallback Parser** - Advanced regex-based extractor handles unknown sources automatically, extracting title, description, author, date, and images from meta tags and HTML patterns Freshness Filtering** - Built-in 45-day freshness threshold filters outdated content before saving, with configurable date validation logic Tier-Based Classification** - Automatically categorizes articles into Tier 1 (0-7 days), Tier 2 (8-14 days), Tier 3 (15-30 days), or Archive based on publication date Rate Limiting & Error Handling** - Built-in 3-second delays between requests prevents server overload, with comprehensive error handling that continues processing even if individual URLs fail Status Tracking** - Updates source spreadsheet with processing status, enabling easy monitoring and retry logic for failed extractions Key Benefits of Multi-Source Content Aggregation: Scalable Architecture** - Easily add new sources by adding a Switch rule and extraction node, no code changes needed for most sites Data Normalization** - Standardizes extracted data across all sources into a consistent format (title, description, author, date, image, canonical URL) Automated Processing** - Schedule-based execution (every 4 hours) or manual triggers keep your feed updated without manual intervention Quality Control** - Freshness filtering ensures only recent, relevant content enters your feed, reducing noise from outdated articles Flexible Input** - Reads from Google Sheets, making it easy to add URLs in bulk or integrate with other systems Comprehensive Metadata** - Captures full article metadata including canonical URLs, publication dates, author information, and featured images Who's it for This template is designed for content aggregators, news monitoring services, content marketers, SEO professionals, researchers, and anyone who needs to collect and normalize articles from multiple websites. It's perfect for organizations that need to monitor competitor content, aggregate industry news, build content databases, track publication trends, or create unified article feeds without manually scraping each site or writing custom scrapers for every source. How it works / What it does This workflow creates a unified article aggregation system that reads URLs from Google Sheets, routes them to site-specific extractors, normalizes the data, filters by freshness, and saves results to a feed. The system: Reads Pending URLs - Fetches URLs with source identifiers from Google Sheets, filtering for entries with "Pending" status Processes with Rate Limiting - Loops through URLs one at a time with a 3-second delay between requests to respect server resources Fetches HTML Content - Downloads page HTML with proper browser headers (User-Agent, Accept, Accept-Language) to avoid blocking Routes by Source - Switch node directs URLs to specialized extractors (Site A, B, C, D) or universal fallback parser based on Source field Extracts Article Data - Site-specific HTML nodes use custom CSS selectors, while fallback uses regex patterns to extract title, description, author, date, image, and canonical URL Normalizes Data - Standardizes all extracted fields into consistent format, handling missing values and trimming whitespace Filters by Freshness - Validates publication dates and filters out articles older than 45 days (configurable threshold) Calculates Tier & Status - Assigns tier classification and freshness status based on article age Saves to Feed - Appends normalized articles to Article Feed sheet with all metadata Updates Status - Marks processed URLs as complete in source sheet for tracking Key Innovation: Source-Based Routing - Unlike generic scrapers that use one-size-fits-all extraction, this workflow uses intelligent routing to apply site-specific CSS selectors. This dramatically improves extraction accuracy while maintaining a universal fallback for unknown sources, making it both precise and extensible. How to set up 1. Prepare Google Sheets Create a Google Sheet with two tabs: "URLs to Process" and "Article Feed" In "URLs to Process" sheet, create columns: URL, Source, Status Add sample data: URLs in URL column, source identifiers (e.g., "Site A", "Site B") in Source column, and "Pending" in Status column In "Article Feed" sheet, the workflow will automatically create columns: Title, Description, Author, datePublished, imageUrl, canonicalUrl, source, sourceUrl, tier, freshnessStatus, extractedAt Verify your Google Sheets credentials are set up in n8n (OAuth2 recommended) 2. Configure Google Sheets Nodes Open the "Read Pending URLs" node and select your spreadsheet from the document dropdown Set sheet name to "URLs to Process" Configure the "Save to Article Feed" node: select same spreadsheet, set sheet name to "Article Feed", operation should be "Append or Update" Configure the "Update URL Status" node: same spreadsheet, "URLs to Process" sheet, operation "Update" Test connection by running the "Read Pending URLs" node manually to verify it can access your sheet 3. Customize Source Routing Open the "Source Router" (Switch node) to see current routing rules for Site A, B, C, D, and fallback To add a new source: Click "Add Rule", set condition: {{ $('Loop Over URLs').item.json.Source }} equals your source name Create a new HTML extraction node for your source with appropriate CSS selectors Connect the new extractor to the "Normalize Extracted Data" node Update the Switch node to route to your new extractor Example CSS selectors for common sites: // WordPress sites title: "h1.entry-title, .post-title" author: ".author-name, .byline a" date: "time.entry-date, time[datetime]" // Modern CMS title: "h1.article__title, article h1" author: ".article__byline a, a[rel='author']" date: "time[datetime], meta[property='article:published_time']" 4. Configure Freshness Threshold Open the "Freshness Filter (45 days)" IF node The current threshold is 45 days (configurable in the condition expression) To change threshold: Modify the expression cutoffDate.setDate(cutoffDate.getDate() - 45) to your desired number of days The filter marks articles as "Fresh" (within threshold) or routes to "Outdated" handler Test with sample URLs to verify date parsing works correctly for your sources 5. Set Up Scheduling & Test The workflow includes both Manual Trigger (for testing) and Schedule Trigger (runs every 4 hours) To customize schedule: Open "Schedule (Every 4 Hours)" node and adjust interval For initial testing: Use Manual Trigger, add 2-3 test URLs to your sheet with Status="Pending" Verify execution: Check that URLs are fetched, routed correctly, extracted, and saved to Article Feed Monitor the "Completion Summary" node output to see processing statistics Check execution logs for any errors in HTML extraction or date parsing Common issues: Missing CSS selectors (update extractor), date format mismatches (adjust date parsing), or rate limiting (increase wait time if needed) Requirements Google Sheets Account** - Active Google account with OAuth2 credentials configured in n8n for reading and writing spreadsheet data Source Spreadsheet** - Google Sheet with "URLs to Process" and "Article Feed" tabs, properly formatted with required columns n8n Instance** - Self-hosted or cloud n8n instance with access to external websites (HTTP Request node needs internet connectivity) Source Knowledge** - Understanding of target website HTML structure to configure CSS selectors for site-specific extractors (or use fallback parser for unknown sources)
by Abdullah Alshiekh
What Problem Does It Solve? Brands and marketers spend hours manually searching Google for product reviews. Reading through multiple websites to gauge general sentiment is tedious and inefficient. It is difficult to spot recurring customer complaints or praises without aggregating data. This workflow solves these by: Instantly searching and scraping review content from the web. Using AI to read and score the sentiment of every review found. Generating a consolidated "Executive Summary" with key quotes and actionable advice. How to Configure It Telegram Setup Connect your Telegram Bot credentials in n8n. Set the Get Message node to watch for text messages. Search & Scraping (Decodo) Connect your Decodo credentials (requires a Web Scraping API plan). This handles both the Google Search and the content extraction. AI Setup Add your Google Gemini API key. The prompts are pre-configured to act as a "Strict Data Analyst," but you can edit the system prompt in the AI Agent node to match your preferred tone. How It Works Trigger:** You send a company or product name (e.g., "XQ Pharma") to your Telegram bot. Search:** The workflow uses Decodo to Google search for "[Name] reviews" and extracts the top URL results. Scrape:** It visits the review pages and strips away the HTML code to get clean text. Analyze (Loop):** The first AI Agent reads the text and determines the sentiment (Positive/Neutral/Negative) and key topics. Report:* A second AI Agent collects all the analysis pieces and writes a final summary containing a *Sentiment Score, **Customer Voice (direct quotes), and an Actionable Verdict. Delivery:** The final report is sent back to you as a Telegram message. Customization Ideas Change the Source:** Modify the search query to target specific platforms (e.g., "site:reddit.com [Product] reviews"). Change the Output:* Send the final report to a *Slack channel* or *Email** for your team to see. Database Logging:* Save the "Actionable Verdict" and sentiment scores into *Notion* or *Airtable** to track brand reputation over time. Competitor Analysis:** Use it to research competitor products instead of your own to find their weaknesses. If you need any help Get in Touch
by Khairul Muhtadin
Stop wasting hours manually hunting for business leads. This workflow automates the entire process from scraping Google Maps to extracting contact emails all triggered from your phone via Telegram. What It Does Send a single message to your Telegram bot (Sector; Limit; MapsURL) and the system takes over. It scrapes business data from Google Maps using Apify, generates AI-powered company summaries via OpenAI, hunts for contact emails from business websites using Jina AI, then stores everything neatly in Google Sheets. Who It's For Sales reps building cold outreach lists, marketing agencies prospecting new clients, or anyone who needs targeted local business data fast — without paying for overpriced lead databases. Why It's Worth It Manual research that takes 4 hours gets done in under 5 minutes for 50 leads. Pay only for what you use (Apify + OpenAI) instead of fixed monthly subscriptions. AI deduplication keeps your CRM clean and consistent. What You'll Need | Tool | Purpose | |------|---------| | n8n | Workflow engine | | Apify | Google Maps scraper | | OpenAI API | Summaries & email extraction | | Google Sheets | Lead storage | | Telegram Bot | Mobile trigger interface | | Jina AI | Website-to-text conversion | Quick Setup Import the JSON workflow into your n8n instance Connect credentials: Telegram bot token, Apify API key, OpenAI key, Google account Set up your Sheet with the matching column headers Test with: Coffee Shops; 5; https://www.google.com/maps/search/coffee+shops+london How the Logic Works The workflow runs a two-stage loop per business. First it saves core data (name, phone, address). If a website exists, it then attempts email enrichment. This way, you never lose basic lead data even if a website crawl fails. Extend It Further Swap Google Sheets for HubSpot or Pipedrive, push results to a Slack sales channel, or chain a Gmail node to auto-send intro emails the moment a lead is found. Created by: Khaisa Studio Category: Marketing | Tags: Lead Gen, AI, Google Maps, Telegram Need custom workflows? Contact us Connect with the creator: Portfolio • Workflows • LinkedIn • Medium • Threads
by Kev
⚠️ Important: This workflow uses community nodes (JsonCut, Blotato) and requires a self-hosted n8n instance. This n8n template automates the entire process of transforming blog articles and any kind of other websites into short-form informational videos for Instagram. It scrapes content, generates AI-powered video clips, adds voiceover and subtitles, and publishes directly to social media—all with proper source attribution and branding. Who's it for Content creators, digital marketers, and social media managers who want to repurpose quality blog content into engaging video formats. Perfect for those running content marketing operations who need to maintain consistent social media presence without manual video editing. What it does The workflow takes a blog article URL as input and produces a fully composed Instagram-ready video with: AI-generated background video clips matching the content Professional text-to-speech narration Auto-generated subtitles with word-by-word animations Background music from Creative Commons sources Branding overlay and source attribution Smooth transitions between scenes Direct publishing to Instagram How it works Content Extraction: Firecrawl scrapes the blog article and extracts clean markdown content Content Summarization: An LLM via OpenRouter condenses the article into digestible talking points (max 1,000 characters) Script Generation: A second LLM generates 3-5 video prompts, narration text, and social media caption in structured JSON format Video Generation: Google Veo API creates 8-second background clips in 9:16 format for each prompt Audio Creation: OpenAI TTS converts the narration to speech, while Openverse API fetches royalty-free background music File Upload: All assets (videos, voice, music) are uploaded to JsonCut's storage Video Composition: JsonCut merges everything together with auto-subtitles, transitions, branding overlays, and source attribution Publishing: Blotato uploads the final video to Instagram as a reel with the generated caption Setup requirements Required accounts and credentials: Firecrawl API** - for web scraping OpenRouter API** - for LLM access (uses GPT-4 Mini in this template) Google Gemini API** - for Veo video generation (note: 10 requests/day free tier limit) OpenAI** - for text-to-speech generation JsonCut account** - for video composition and file hosting Blotato account** - for Instagram publishing Instagram Business account** connected to Blotato Installation steps: Install community nodes: @mendable/n8n-nodes-firecrawl n8n-nodes-jsoncut @blotato/n8n-nodes-blotato Configure all API credentials in n8n's credential manager Update the Blotato Instagram account ID in the "Create Instagram post" node Replace the branding overlay image URL in the JsonCut "Generate media" node config: "path": "https://your-logo-url.png" Test with the chat trigger by entering a blog article URL Good to know Cost considerations: Blotato costs $29 (there are many cheaper alternatives available) JsonCut is free (but the Pro subscription is required for the auto caption feature) Veo 3 fast costs approximately $0.15 per second Rate limits: Google Veo free tier is limited to 10 requests per day, which means ~2-3 complete workflows daily Processing time: Full workflow takes 5-10 minutes depending on Veo API response times Source attribution: The workflow automatically extracts the domain from the input URL and displays it on the first video clip Video quality: Output depends heavily on input quality. The workflow is designed for repurposing legitimate content
by vinci-king-01
Product Price Monitor with Mailgun and MongoDB ⚠️ COMMUNITY TEMPLATE DISCLAIMER: This is a community-contributed template that uses ScrapeGraphAI (a community node). Please ensure you have the ScrapeGraphAI community node installed in your n8n instance before using this template. This workflow automatically scrapes multiple e-commerce sites, records weekly product prices in MongoDB, analyzes seasonal trends, and emails a concise report to retail stakeholders via Mailgun. It helps retailers make informed inventory and pricing decisions by providing up-to-date pricing intelligence. Pre-conditions/Requirements Prerequisites n8n instance (self-hosted, desktop, or n8n.cloud) ScrapeGraphAI community node installed and activated MongoDB database (Atlas or self-hosted) Mailgun account with a verified domain Publicly reachable n8n Webhook URL (if self-hosted) Required Credentials ScrapeGraphAI API Key** – Enables web scraping across target sites MongoDB Credentials** – Connection string (MongoDB URI) with read/write access Mailgun API Key & Domain** – To send summary emails MongoDB Collection Schema | Field | Type | Example Value | Notes | |-----------------|----------|---------------------------|---------------------------------------------| | productId | String | SKU-12345 | Unique identifier you define | | productName | String | Women's Winter Jacket | Human-readable name | | timestamp | Date | 2024-09-15T00:00:00Z | Ingest date (automatically added) | | price | Number | 79.99 | Scraped price | | source | String | example-shop.com | Domain where price was scraped | How it works This workflow automatically scrapes multiple e-commerce sites, records weekly product prices in MongoDB, analyzes seasonal trends, and emails a concise report to retail stakeholders via Mailgun. It helps retailers make informed inventory and pricing decisions by providing up-to-date pricing intelligence. Key Steps: Webhook Trigger**: Starts the workflow on a scheduled HTTP call or manual trigger. Code (Prepare Products)**: Defines the list of SKUs/URLs to monitor. Split In Batches**: Processes products in manageable chunks to respect rate limits. ScrapeGraphAI (Scrape Price)**: Extracts price, availability, and currency from each product URL. Merge (Combine Results)**: Re-assembles all batch outputs into one dataset. MongoDB (Upsert Price History)**: Stores each price point for historical analysis. If (Seasonal Trend Check)**: Compares current price against historical average to detect anomalies. Set (Email Payload)**: Formats the trend report for email. Mailgun (Send Email)**: Emails weekly summary to specified recipients. Respond to Webhook**: Returns “200 OK – Report Sent” response for logging. Set up steps Setup Time: 15-20 minutes Install Community Node In n8n, go to “Settings → Community Nodes” and install @n8n-community/nodes-scrapegraphai. Create Credentials Add ScrapeGraphAI API key under Credentials. Add MongoDB credentials (type: MongoDB). Add Mailgun credentials (type: Mailgun). Import Workflow Download the JSON template, then in n8n click “Import” and select the file. Configure Product List Open the Code (Prepare Products) node and replace the example array with your product objects { id, name, url }. Adjust Cron/Schedule If you prefer a fully automated schedule, replace the Webhook with a Cron node (e.g., every Monday at 09:00). Verify MongoDB Collection Ensure the collection (default: productPrices) exists or let n8n create it on first run. Set Recipients In the Mailgun node, update the to, from, and subject fields. Execute Test Run Manually trigger the Webhook URL or run the workflow once to verify data flow and email delivery. Activate Toggle the workflow to “Active” so it runs automatically each week. Node Descriptions Core Workflow Nodes: Webhook** – Entry point that accepts a GET/POST call to start the job. Code (Prepare Products)** – Outputs an array of products to monitor. Split In Batches** – Limits scraping to N products per request to avoid banning. ScrapeGraphAI** – Scrapes the HTML of a product page and parses pricing data. Merge** – Re-combines batch results for streamlined processing. MongoDB** – Inserts or updates each product’s price history document. If** – Determines whether price deviates > X% from the season average. Set** – Builds an HTML/text email body containing the findings. Mailgun** – Sends the email via Mailgun REST API. Respond to Webhook** – Returns an HTTP response for logging/monitoring. Sticky Notes** – Provide in-workflow documentation (no execution). Data Flow: Webhook → Code → Split In Batches Split In Batches → ScrapeGraphAI → Merge Merge → MongoDB → If If (true) → Set → Mailgun → Respond to Webhook Customization Examples Change Scraping Frequency (Cron) // Cron node settings { "mode": "custom", "cronExpression": "0 6 * * 1,4" // Monday & Thursday 06:00 } Extend Data Points (Reviews Count, Stock) // In ScrapeGraphAI extraction config { "price": "css:span.price", "inStock": "css:div.availability", "reviewCount": "regex:\"(\\d+) reviews\"" } Data Output Format The workflow outputs structured JSON data: { "productId": "SKU-12345", "productName": "Women's Winter Jacket", "timestamp": "2024-09-15T00:00:00Z", "price": 79.99, "currency": "USD", "source": "example-shop.com", "trend": "5% below 3-month average" } Troubleshooting Common Issues ScrapeGraphAI returns empty data – Confirm selectors/XPath are correct; test with ScrapeGraphAI playground. MongoDB connection fails – Verify IP-whitelisting for Atlas or network connectivity for self-hosted instance. Mail not delivered – Check Mailgun logs for bounce or spam rejection, and ensure from domain is verified. Performance Tips Use smaller batch sizes (e.g., 5 URLs) to avoid target site rate-limit blocks. Cache static product info; scrape only fields that change (price, stock). Pro Tips: Integrate the IF node with n8n’s Slack node to push urgent price drops to a channel. Add a Function node to calculate moving averages for deeper analysis. Store raw HTML snapshots in S3/MinIO for auditability and debugging.
by Onur
Lead Sourcing by Job Posts For Outreach With Scrape.do API & Open AI & Google Sheets Overview This n8n workflow automates the complete lead generation process by scraping job postings from Indeed, enriching company data via Apollo.io, identifying decision-makers, and generating personalized LinkedIn outreach messages using OpenAI. It integrates with Scrape.do for reliable web scraping, Apollo.io for B2B data enrichment, OpenAI for AI-powered personalization, and Google Sheets for centralized data storage. Perfect for: Sales teams, recruiters, business development professionals, and marketing agencies looking to automate their outbound prospecting pipeline. Workflow Components 1. ⏰ Schedule Trigger | Property | Value | |----------|-------| | Type | Schedule Trigger | | Purpose | Automatically initiates workflow on a recurring schedule | | Frequency | Weekly (Every Monday) | | Time | 00:00 UTC | Function: Ensures consistent, hands-off lead generation by running the pipeline automatically without manual intervention. 2. 🔍 Scrape.do Indeed API | Property | Value | |----------|-------| | Type | HTTP Request (GET) | | Purpose | Scrapes job listings from Indeed via Scrape.do proxy API | | Endpoint | https://api.scrape.do | | Output Format | Markdown | Request Parameters: | Parameter | Value | Description | |-----------|-------|-------------| | token | API Token | Scrape.do authentication | | url | Indeed Search URL | Target job search page | | super | true | Uses residential proxies | | geoCode | us | US-based content | | render | true | JavaScript rendering enabled | | device | mobile | Mobile viewport for cleaner HTML | | output | markdown | Lightweight text output | Function: Fetches Indeed job listings with anti-bot bypass, returning clean markdown for easy parsing. 3. 📋 Parse Indeed Jobs | Property | Value | |----------|-------| | Type | Code Node (JavaScript) | | Purpose | Extracts structured job data from markdown | | Mode | Run once for all items | Extracted Fields: | Field | Description | Example | |-------|-------------|---------| | jobTitle | Position title | "Senior Data Engineer" | | jobUrl | Indeed job link | "https://indeed.com/viewjob?jk=abc123" | | jobId | Indeed job identifier | "abc123" | | companyName | Hiring company | "Acme Corporation" | | location | City, State | "San Francisco, CA" | | salary | Pay range | "$120,000 - $150,000" | | jobType | Employment type | "Full-time" | | source | Data source | "Indeed" | | dateFound | Scrape date | "2025-01-15" | Function: Parses markdown using regex patterns, filters invalid entries, and deduplicates by company name. 4. 📊 Add New Company (Google Sheets) | Property | Value | |----------|-------| | Type | Google Sheets Node | | Purpose | Stores parsed job postings for tracking | | Operation | Append rows | | Target Sheet | "Add New Company" | Function: Creates a historical record of all discovered job postings and companies for pipeline tracking. 5. 🏢 Apollo Organization Search | Property | Value | |----------|-------| | Type | HTTP Request (POST) | | Purpose | Enriches company data via Apollo.io API | | Endpoint | https://api.apollo.io/v1/organizations/search | | Authentication | HTTP Header Auth (x-api-key) | Request Body: { "q_organization_name": "Company Name", "page": 1, "per_page": 1 } Response Fields: | Field | Description | |-------|-------------| | id | Apollo organization ID | | name | Official company name | | website_url | Company website | | linkedin_url | LinkedIn company page | | industry | Business sector | | estimated_num_employees | Company size | | founded_year | Year established | | city, state, country | Location details | | short_description | Company overview | Function: Retrieves comprehensive company intelligence including LinkedIn profiles, industry classification, and employee count. 6. 📤 Extract Apollo Org Data | Property | Value | |----------|-------| | Type | Code Node (JavaScript) | | Purpose | Parses Apollo response and merges with original data | | Mode | Run once for each item | Function: Extracts relevant fields from Apollo API response and combines with job posting data for downstream processing. 7. 👥 Apollo People Search | Property | Value | |----------|-------| | Type | HTTP Request (POST) | | Purpose | Finds decision-makers at target companies | | Endpoint | https://api.apollo.io/v1/mixed_people/search | | Authentication | HTTP Header Auth (x-api-key) | Request Body: { "organization_ids": ["apollo_org_id"], "person_titles": [ "CTO", "Chief Technology Officer", "VP Engineering", "Head of Engineering", "Engineering Manager", "Technical Director", "CEO", "Founder" ], "page": 1, "per_page": 3 } Response Fields: | Field | Description | |-------|-------------| | first_name | Contact first name | | last_name | Contact last name | | title | Job title | | email | Email address | | linkedin_url | LinkedIn profile URL | | phone_number | Direct phone | Function: Identifies key stakeholders and decision-makers based on configurable title filters. 8. 📝 Format Leads | Property | Value | |----------|-------| | Type | Code Node (JavaScript) | | Purpose | Structures lead data for outreach | | Mode | Run once for all items | Function: Combines person data with company context, creating comprehensive lead profiles ready for personalization. 9. 🤖 Generate Personalized Message (OpenAI) | Property | Value | |----------|-------| | Type | OpenAI Node | | Purpose | Creates custom LinkedIn connection messages | | Model | gpt-4o-mini | | Max Tokens | 150 | | Temperature | 0.7 | System Prompt: You are a professional outreach specialist. Write personalized LinkedIn connection request messages. Keep messages under 300 characters. Be friendly, professional, and mention a specific reason for connecting based on their role and company. User Prompt Variables: | Variable | Source | |----------|--------| | Name | $json.fullName | | Title | $json.title | | Company | $json.companyName | | Industry | $json.industry | | Job Context | $json.jobTitle | Function: Generates unique, contextual outreach messages that reference specific hiring activity and company details. 10. 🔗 Merge Lead + Message | Property | Value | |----------|-------| | Type | Code Node (JavaScript) | | Purpose | Combines lead data with generated message | | Mode | Run once for each item | Function: Merges OpenAI response with lead profile, creating the final enriched record. 11. 💾 Save Leads to Sheet | Property | Value | |----------|-------| | Type | Google Sheets Node | | Purpose | Stores final lead data with personalized messages | | Operation | Append rows | | Target Sheet | "Leads" | Data Mapping: | Column | Data | |--------|------| | First Name | Lead's first name | | Last Name | Lead's last name | | Title | Job title | | Company | Company name | | LinkedIn URL | Profile link | | Country | Location | | Industry | Business sector | | Date Added | Timestamp | | Source | "Indeed + Apollo" | | Personalized Message | AI-generated outreach text | Function: Creates actionable lead database ready for outreach campaigns. Workflow Flow ⏰ Schedule Trigger │ ▼ 🔍 Scrape.do Indeed API ──► Fetches job listings with JS rendering │ ▼ 📋 Parse Indeed Jobs ──► Extracts company names, job details │ ▼ 📊 Add New Company ──► Saves to Google Sheets (Companies) │ ▼ 🏢 Apollo Org Search ──► Enriches company data │ ▼ 📤 Extract Apollo Org Data ──► Parses API response │ ▼ 👥 Apollo People Search ──► Finds decision-makers │ ▼ 📝 Format Leads ──► Structures lead profiles │ ▼ 🤖 Generate Personalized Message ──► AI creates custom outreach │ ▼ 🔗 Merge Lead + Message ──► Combines all data │ ▼ 💾 Save Leads to Sheet ──► Final storage (Leads) Configuration Requirements API Keys & Credentials | Credential | Purpose | Where to Get | |------------|---------|--------------| | Scrape.do API Token | Web scraping with anti-bot bypass | scrape.do/dashboard | | Apollo.io API Key | B2B data enrichment | apollo.io/settings/integrations | | OpenAI API Key | AI message generation | platform.openai.com | | Google Sheets OAuth2 | Data storage | n8n Credentials Setup | n8n Credential Setup | Credential Type | Configuration | |-----------------|---------------| | HTTP Header Auth (Apollo) | Header: x-api-key, Value: Your Apollo API key | | OpenAI API | API Key: Your OpenAI API key | | Google Sheets OAuth2 | Complete OAuth flow with Google | Key Features 🔍 Intelligent Job Scraping Anti-Bot Bypass:** Residential proxy rotation via Scrape.do JavaScript Rendering:** Full headless browser for dynamic content Mobile Optimization:** Cleaner HTML with mobile viewport Markdown Output:** Lightweight, easy-to-parse format 🏢 B2B Data Enrichment Company Intelligence:** Industry, size, location, LinkedIn Decision-Maker Discovery:** Title-based filtering Contact Information:** Email, phone, LinkedIn profiles Real-Time Data:** Fresh information from Apollo.io 🤖 AI-Powered Personalization Contextual Messages:** References specific hiring activity Character Limit:** Optimized for LinkedIn (300 chars) Variable Temperature:** Balanced creativity and consistency Role-Specific:** Tailored to recipient's title and company 📊 Automated Data Management Dual Sheet Storage:** Companies + Leads separation Timestamp Tracking:** Historical records Deduplication:** Prevents duplicate entries Ready for Export:** CSV-compatible format Use Cases 🎯 Sales Prospecting Identify companies actively hiring in your target market Find decision-makers at companies investing in growth Generate personalized cold outreach at scale Track pipeline from discovery to contact 👥 Recruiting & Talent Acquisition Monitor competitor hiring patterns Identify companies building specific teams Connect with hiring managers directly Build talent pipeline relationships 📈 Market Intelligence Track industry hiring trends Monitor competitor expansion signals Identify emerging market opportunities Benchmark salary ranges by role 🤝 Partnership Development Find companies investing in complementary areas Identify potential integration partners Connect with technical leadership Build strategic relationship pipeline Technical Notes | Specification | Value | |---------------|-------| | Processing Time | 2-5 minutes per run (depending on job count) | | Jobs per Run | ~25 unique companies | | API Calls per Run | 1 Scrape.do + 25 Apollo Org + 25 Apollo People + ~75 OpenAI | | Data Accuracy | 90%+ for company matching | | Success Rate | 99%+ with proper error handling | Rate Limits to Consider | Service | Free Tier Limit | Recommendation | |---------|-----------------|----------------| | Scrape.do | 1,000 credits/month | ~40 runs/month | | Apollo.io | 100 requests/day | Add Wait nodes if needed | | OpenAI | Based on usage | Monitor costs (~$0.01-0.05/run) | | Google Sheets | 300 requests/minute | No issues expected | Setup Instructions Step 1: Import Workflow Copy the JSON workflow configuration In n8n: Workflows → Import from JSON Paste configuration and save Step 2: Configure Scrape.do Sign up at scrape.do Navigate to Dashboard → API Token Copy your token Token is embedded in URL query parameter (already configured) To customize search: Change the url parameter in "Scrape.do Indeed API" node: q=data+engineer (search term) l=Remote (location) fromage=7 (last 7 days) Step 3: Configure Apollo.io Sign up at apollo.io Go to Settings → Integrations → API Keys Create new API key In n8n: Credentials → Add Credential → Header Auth Name: x-api-key Value: Your Apollo API key Select this credential in both Apollo HTTP nodes Step 4: Configure OpenAI Go to platform.openai.com Create new API key In n8n: Credentials → Add Credential → OpenAI Paste API key Select credential in "Generate Personalized Message" node Step 5: Configure Google Sheets Create new Google Spreadsheet Create two sheets: Sheet 1: "Add New Company" Columns: companyName | jobTitle | jobUrl | location | salary | source | postedDate Sheet 2: "Leads" Columns: First Name | Last Name | Title | Company | LinkedIn URL | Country | Industry | Date Added | Source | Personalized Message Copy Sheet ID from URL In n8n: Credentials → Add Credential → Google Sheets OAuth2 Update both Google Sheets nodes with your Sheet ID Step 6: Test and Activate Manual Test: Click "Execute Workflow" button Verify Each Node: Check outputs step by step Review Data: Confirm data appears in Google Sheets Activate: Toggle workflow to "Active" Error Handling Common Issues | Issue | Cause | Solution | |-------|-------|----------| | "Invalid character: " | Empty/malformed company name | Check Parse Indeed Jobs output | | "Node does not have credentials" | Credential not linked | Open node → Select credential | | Empty Parse Results | Indeed HTML structure changed | Check Scrape.do raw output | | Apollo Rate Limit (429) | Too many requests | Add 5-10s Wait node between calls | | OpenAI Timeout | Too many tokens | Reduce batch size or max_tokens | | "Your request is invalid" | Malformed JSON body | Verify expression syntax in HTTP nodes | Troubleshooting Steps Verify Credentials: Test each credential individually Check Node Outputs: Use "Execute Node" for debugging Monitor API Usage: Check Apollo and OpenAI dashboards Review Logs: Check n8n execution history for details Test with Sample: Use known company name to verify Apollo Recommended Error Handling Additions For production use, consider adding: IF node after Apollo Org Search to handle empty results Error Workflow trigger for notifications Wait nodes between API calls for rate limiting Retry logic for transient failures Performance Specifications | Metric | Value | |--------|-------| | Execution Time | 2-5 minutes per scheduled run | | Jobs Discovered | ~25 per Indeed page | | Leads Generated | 1-3 per company (based on title matches) | | Message Quality | Professional, contextual, <300 chars | | Data Freshness | Real-time from Indeed + Apollo | | Storage Format | Google Sheets (unlimited rows) | API Reference Scrape.do API | Endpoint | Method | Purpose | |----------|--------|---------| | https://api.scrape.do | GET | Direct URL scraping | Documentation: [scrape.do/documentation Apollo.io API | Endpoint | Method | Purpose | |----------|--------|---------| | /v1/organizations/search | POST | Company lookup | | /v1/mixed_people/search | POST | People search | Documentation: apolloio.github.io/apollo-api-docs OpenAI API | Endpoint | Method | Purpose | |----------|--------|---------| | /v1/chat/completions | POST | Message generation | Documentation: [platform.openai.com
by Cheng Siong Chin
Introduction Automates flight deal discovery and intelligent analysis for travel bloggers and deal hunters. Scrapes live pricing, enriches with weather data, applies AI evaluation, and auto-publishes to WordPress—eliminating manual research and accelerating content delivery. How It Works User submits route via form, scrapes real-time flight prices and weather data, AI analyzes deal quality considering weather conditions, formats results, publishes to WordPress, sends Slack notification—fully automated from input to publication. Workflow Template Form Input → Extract Data → Scrape Flight Prices → Extract Pricing → Fetch Weather → Parse Weather → Prepare AI Input → AI Analysis → Parse Output → Format Results → Publish WordPress → Slack Alert → User Response Setup Instructions Form Setup: Configure user input fields for flight routes and preferences APIs: Connect Google Flights scraping endpoint, weather API credentials, OpenAI/Chat Model API key Publishing: Set WordPress credentials, target blog category, Slack webhook URL AI Configuration: Define analysis prompts, output structure, parser rules Workflow Steps Data Collection: Form captures route, scrapes Google Flights pricing, fetches destination weather via API AI Processing: Enriches flight data with weather context, analyzes deal quality using OpenAI/Chat Model with structured output parsing Publishing: Formats analysis results, creates WordPress post, sends Slack notification, delivers response to user Prerequisites n8n instance, Google Flights access, weather API key, OpenAI/compatible AI service, WordPress site with API access, Slack workspace Use Cases Travel blog automation, flight deal newsletters, price comparison services, seasonal travel planning, destination weather analysis, automated social media content Customization Modify AI analysis criteria, adjust weather impact weighting, customize WordPress post templates, add email distribution, integrate additional data sources, expand to hotel/rental deals Benefits Eliminates manual price checking, combines multiple data sources automatically, delivers AI-enhanced insights, accelerates publishing workflow, scales across unlimited routes, provides weather-aware recommendations
by Hiroshi Hashimoto
AI Handwritten Memo Organizer – Overview This workflow receives handwritten memo images sent via LINE and automatically extracts, summarizes, and organizes the content using AI. Step-by-step process: User sends a handwritten memo image via LINE Webhook receives the image Immediate reply is sent: “Processing…” Image is saved to Google Drive AI performs OCR and generates structured data (title, category, summary, tags) The JSON response is safely parsed with error handling OCR failure is detected if text cannot be properly extracted If OCR fails: → User is notified with guidance for retaking the image If OCR succeeds: → Check if the category sheet exists in Google Sheets → If not, create a new sheet → Save the data (title, summary, tags, date, image URL) Completion message is sent to the user via LINE Setup Steps Create a LINE Messaging API channel and obtain the Channel Access Token Create a Google Spreadsheet for storing memo data Create a Google Drive folder to store uploaded images Set the following values in the Config node: o LINE_ACCESS_TOKEN o GOOGLE_SHEETS_ID Set the Webhook URL in the LINE Developers Console