by Omer Fayyaz
An intelligent web scraping workflow that automatically routes URLs to site-specific extraction logic, normalizes data across multiple sources, and filters content by freshness to build a unified article feed. What Makes This Different: Intelligent Source Routing** - Uses a Switch node to route URLs to specialized extractors based on source identifier, enabling custom CSS selectors per publisher for maximum accuracy Universal Fallback Parser** - Advanced regex-based extractor handles unknown sources automatically, extracting title, description, author, date, and images from meta tags and HTML patterns Freshness Filtering** - Built-in 45-day freshness threshold filters outdated content before saving, with configurable date validation logic Tier-Based Classification** - Automatically categorizes articles into Tier 1 (0-7 days), Tier 2 (8-14 days), Tier 3 (15-30 days), or Archive based on publication date Rate Limiting & Error Handling** - Built-in 3-second delays between requests prevents server overload, with comprehensive error handling that continues processing even if individual URLs fail Status Tracking** - Updates source spreadsheet with processing status, enabling easy monitoring and retry logic for failed extractions Key Benefits of Multi-Source Content Aggregation: Scalable Architecture** - Easily add new sources by adding a Switch rule and extraction node, no code changes needed for most sites Data Normalization** - Standardizes extracted data across all sources into a consistent format (title, description, author, date, image, canonical URL) Automated Processing** - Schedule-based execution (every 4 hours) or manual triggers keep your feed updated without manual intervention Quality Control** - Freshness filtering ensures only recent, relevant content enters your feed, reducing noise from outdated articles Flexible Input** - Reads from Google Sheets, making it easy to add URLs in bulk or integrate with other systems Comprehensive Metadata** - Captures full article metadata including canonical URLs, publication dates, author information, and featured images Who's it for This template is designed for content aggregators, news monitoring services, content marketers, SEO professionals, researchers, and anyone who needs to collect and normalize articles from multiple websites. It's perfect for organizations that need to monitor competitor content, aggregate industry news, build content databases, track publication trends, or create unified article feeds without manually scraping each site or writing custom scrapers for every source. How it works / What it does This workflow creates a unified article aggregation system that reads URLs from Google Sheets, routes them to site-specific extractors, normalizes the data, filters by freshness, and saves results to a feed. The system: Reads Pending URLs - Fetches URLs with source identifiers from Google Sheets, filtering for entries with "Pending" status Processes with Rate Limiting - Loops through URLs one at a time with a 3-second delay between requests to respect server resources Fetches HTML Content - Downloads page HTML with proper browser headers (User-Agent, Accept, Accept-Language) to avoid blocking Routes by Source - Switch node directs URLs to specialized extractors (Site A, B, C, D) or universal fallback parser based on Source field Extracts Article Data - Site-specific HTML nodes use custom CSS selectors, while fallback uses regex patterns to extract title, description, author, date, image, and canonical URL Normalizes Data - Standardizes all extracted fields into consistent format, handling missing values and trimming whitespace Filters by Freshness - Validates publication dates and filters out articles older than 45 days (configurable threshold) Calculates Tier & Status - Assigns tier classification and freshness status based on article age Saves to Feed - Appends normalized articles to Article Feed sheet with all metadata Updates Status - Marks processed URLs as complete in source sheet for tracking Key Innovation: Source-Based Routing - Unlike generic scrapers that use one-size-fits-all extraction, this workflow uses intelligent routing to apply site-specific CSS selectors. This dramatically improves extraction accuracy while maintaining a universal fallback for unknown sources, making it both precise and extensible. How to set up 1. Prepare Google Sheets Create a Google Sheet with two tabs: "URLs to Process" and "Article Feed" In "URLs to Process" sheet, create columns: URL, Source, Status Add sample data: URLs in URL column, source identifiers (e.g., "Site A", "Site B") in Source column, and "Pending" in Status column In "Article Feed" sheet, the workflow will automatically create columns: Title, Description, Author, datePublished, imageUrl, canonicalUrl, source, sourceUrl, tier, freshnessStatus, extractedAt Verify your Google Sheets credentials are set up in n8n (OAuth2 recommended) 2. Configure Google Sheets Nodes Open the "Read Pending URLs" node and select your spreadsheet from the document dropdown Set sheet name to "URLs to Process" Configure the "Save to Article Feed" node: select same spreadsheet, set sheet name to "Article Feed", operation should be "Append or Update" Configure the "Update URL Status" node: same spreadsheet, "URLs to Process" sheet, operation "Update" Test connection by running the "Read Pending URLs" node manually to verify it can access your sheet 3. Customize Source Routing Open the "Source Router" (Switch node) to see current routing rules for Site A, B, C, D, and fallback To add a new source: Click "Add Rule", set condition: {{ $('Loop Over URLs').item.json.Source }} equals your source name Create a new HTML extraction node for your source with appropriate CSS selectors Connect the new extractor to the "Normalize Extracted Data" node Update the Switch node to route to your new extractor Example CSS selectors for common sites: // WordPress sites title: "h1.entry-title, .post-title" author: ".author-name, .byline a" date: "time.entry-date, time[datetime]" // Modern CMS title: "h1.article__title, article h1" author: ".article__byline a, a[rel='author']" date: "time[datetime], meta[property='article:published_time']" 4. Configure Freshness Threshold Open the "Freshness Filter (45 days)" IF node The current threshold is 45 days (configurable in the condition expression) To change threshold: Modify the expression cutoffDate.setDate(cutoffDate.getDate() - 45) to your desired number of days The filter marks articles as "Fresh" (within threshold) or routes to "Outdated" handler Test with sample URLs to verify date parsing works correctly for your sources 5. Set Up Scheduling & Test The workflow includes both Manual Trigger (for testing) and Schedule Trigger (runs every 4 hours) To customize schedule: Open "Schedule (Every 4 Hours)" node and adjust interval For initial testing: Use Manual Trigger, add 2-3 test URLs to your sheet with Status="Pending" Verify execution: Check that URLs are fetched, routed correctly, extracted, and saved to Article Feed Monitor the "Completion Summary" node output to see processing statistics Check execution logs for any errors in HTML extraction or date parsing Common issues: Missing CSS selectors (update extractor), date format mismatches (adjust date parsing), or rate limiting (increase wait time if needed) Requirements Google Sheets Account** - Active Google account with OAuth2 credentials configured in n8n for reading and writing spreadsheet data Source Spreadsheet** - Google Sheet with "URLs to Process" and "Article Feed" tabs, properly formatted with required columns n8n Instance** - Self-hosted or cloud n8n instance with access to external websites (HTTP Request node needs internet connectivity) Source Knowledge** - Understanding of target website HTML structure to configure CSS selectors for site-specific extractors (or use fallback parser for unknown sources)
by Luis Hernandez
Overview This comprehensive n8n workflow automates the generation and distribution of detailed monthly technical support reports from GLPI (IT Service Management platform). The workflow intelligently calculates SLA compliance, analyzes technician performance, and delivers professionally formatted HTML reports via email. ✨ Key Features Intelligent SLA Calculation Business Hours Tracking: Automatically calculates resolution time considering only working hours (excludes weekends and lunch breaks) Configurable Schedule: Customizable work hours (default: 8 AM - 12 PM, 1 PM - 6 PM) Dynamic SLA Monitoring: Real-time compliance tracking with configurable thresholds (default: 24 hours) Visual Indicators: Color-coded alerts for critical SLA breaches and high-volume warnings Comprehensive Reporting General Summary: Total cases, open, in-progress, resolved, and closed tickets Performance Metrics: Total and average resolution hours in both decimal and formatted (hours/minutes) display Technician Breakdown: Individual performance analysis per technician including case distribution and SLA compliance Smart Alerts: Automatic warnings for high case volumes (>100 in-progress) and critical SLA levels (<50%) Professional Email Delivery Responsive HTML Design: Mobile-optimized email templates with elegant styling Dynamic Content: Conditional formatting based on performance metrics Automatic Scheduling: Monthly execution on the 6th day to ensure accurate SLA measurement 💼 Business Benefits Time Savings Eliminates Manual Work: Saves 2-4 hours per month previously spent compiling reports manually Automated Data Collection: No more exporting CSVs or copying data between systems One-Click Setup: Configure once and receive reports automatically every month Improved Decision Making Real-Time Insights: Identify bottlenecks and performance issues immediately Technician Accountability: Clear visibility into individual and team performance SLA Compliance Tracking: Proactively manage service level agreements before they become critical Enhanced Communication Stakeholder Ready: Professional reports suitable for management presentations Consistent Format: Standardized metrics ensure month-over-month comparability Instant Distribution: Automatic email delivery to relevant stakeholders 🔧 Technical Specifications Requirements n8n instance (self-hosted or cloud) GLPI server with API access enabled Gmail account (or any SMTP-compatible email service) GLPI API credentials (App-Token and User credentials) Configuration Points Variables Node: Server URL, API tokens, entity name, work hours, SLA limits Schedule Trigger: Monthly execution timing (default: 6th of each month) Email Recipient: Target email address for report delivery Date Range Logic: Automatic previous month calculation Data Processing Retrieves up to 999 tickets per execution (configurable) Filters by entity and date range Excludes weekends and non-business hours from calculations Groups data by technician for detailed analysis 📋 Setup Instructions Prerequisites GLPI Configuration: Enable API and configure the Tickets panel with required fields (ID, -Title, Status, Opening Date, Closing Date, Resolution Date, Priority, Requester, Assigned To) API Credentials: Create Basic Auth credentials in n8n for GLPI API access Email Authentication: Set up Gmail OAuth2 or SMTP credentials in n8n Implementation Steps Import the workflow JSON into your n8n instance Configure the Variables node with your GLPI server details and business hours Set up GLPI API credentials in the HTTP Request nodes Configure email credentials in the Gmail node Update the recipient email address Test the workflow manually before enabling the schedule Activate the workflow for automatic monthly execution 🎯 Use Cases IT Support Teams: Track helpdesk performance and SLA compliance Service Managers: Monitor team productivity and identify training needs Executive Reporting: Provide high-level summaries to stakeholders Resource Planning: Identify workload distribution and capacity issues Compliance Auditing: Maintain historical records of SLA performance 📈 ROI Impact Time Savings: 24-48 hours annually in manual reporting eliminated Error Reduction: Eliminates human calculation errors in SLA tracking Faster Response: Early alerts enable proactive issue resolution Better Visibility: Data-driven insights improve team management
by Intuz
This n8n template from Intuz provides a complete solution to automate your entire invoicing process. It intelligently syncs confirmed sales orders from your Airtable base to QuickBooks, automatically creating new customers if they don't exist before generating a perfectly matched invoice. It then logs all invoice details back into Airtable, creating a flawless, end-to-end financial workflow. Use Cases 1. Accounting & Finance Teams: Automatically generate QuickBooks invoices from new orders confirmed in Airtable. Keep all invoices and customer details synced across systems in real time. 2. Sales & Operations Teams: Track order status and billing progress directly from Airtable without switching platforms. Ensure every confirmed sale automatically triggers an invoice in QuickBooks. 3. Business Owners / Admins: Eliminate double-entry between Airtable and QuickBooks. Maintain accurate, audit-ready financial records with minimal effort. How it works 1. Trigger from Airtable: The workflow starts instantly when a sales order is ready to be invoiced in your Airtable base (triggered via a webhook). 2. Check for Customer in QuickBooks: It searches your QuickBooks account to see if the customer from the sales order already exists. 3. Create New Customer (If Needed): If the customer is not found, it automatically creates a new customer record in QuickBooks using the details from your Airtable Customers table. 4. Create QuickBooks Invoice: Using the correct customer record (either existing or newly created), it gathers all order line items from Airtable and generates a detailed invoice in QuickBooks. 5. Log Invoice Back to Airtable: After the invoice is successfully created, the workflow updates your Airtable base by adding a new record to your Invoices & Payments table and updating the original Confirmed Orders record with the new QuickBooks Invoice ID, marking it as synced. Key Requirements to Use This Template 1. n8n Instance: An active n8n account (Cloud or self-hosted). 2. Airtable Base: An Airtable base on a "Pro" plan or higher with tables for Confirmed Orders, Customers, Order Lines, Product & Service, and Invoices & Payments. Field names must match those in the setup guide. 3. QuickBooks Online Account: An active QuickBooks Online account with API access. Step-by-Step Setup Instructions Step 1: Import and Configure the n8n Workflow Import Workflow:** In n8n, import the Client-Quickbook-Invoices-via-AirTable.json file. Get Webhook URL:** Click on the first node, "Webhook". Copy the "Test URL". Keep this n8n tab open. Configure Airtable Nodes:** There are six Airtable nodes. For each one, connect your Airtable credentials and select the correct Base and Table. Configure QuickBooks Nodes:** There are four QuickBooks-related nodes. For each one, connect your QuickBooks Online credentials. CRITICAL:** Click on the "Create Invoice URL" (HTTP Request) node. You must edit the URL and replace the placeholder number (9341455145770046) with your own QuickBooks Company ID. (Find this in your QuickBooks account settings under "Billing & Subscription"). Save and Activate**: Click "Save", then toggle the workflow to "Active". After activating, copy the new "Production URL" from the Webhook node. Customization Guide You can adapt this template for various workflows by tweaking a few nodes: Use a different Airtable Base:** Update the Base ID and Table ID in all Airtable nodes (Get Orders Records, Get Customer Details, Get Products, etc.). Switch from Sandbox to Live QuickBooks:** Replace the Sandbox company ID and endpoint in the “Create Invoice URL” node with your production QuickBooks company ID. Add more invoice details:** Edit the Code and Parse in HTTP nodes to include additional fields (like Tax, Shipping, or Notes). Support multiple currencies:** Add a “Currency” field mapping in both Airtable and QuickBooks nodes. Connect with us Website: https://www.intuz.com/services Email: getstarted@intuz.com LinkedIn: https://www.linkedin.com/company/intuz Get Started: https://n8n.partnerlinks.io/intuz For Custom Workflow Automation Click here- Get Started
by Abdullah Alshiekh
What Problem Does It Solve? Brands and marketers spend hours manually searching Google for product reviews. Reading through multiple websites to gauge general sentiment is tedious and inefficient. It is difficult to spot recurring customer complaints or praises without aggregating data. This workflow solves these by: Instantly searching and scraping review content from the web. Using AI to read and score the sentiment of every review found. Generating a consolidated "Executive Summary" with key quotes and actionable advice. How to Configure It Telegram Setup Connect your Telegram Bot credentials in n8n. Set the Get Message node to watch for text messages. Search & Scraping (Decodo) Connect your Decodo credentials (requires a Web Scraping API plan). This handles both the Google Search and the content extraction. AI Setup Add your Google Gemini API key. The prompts are pre-configured to act as a "Strict Data Analyst," but you can edit the system prompt in the AI Agent node to match your preferred tone. How It Works Trigger:** You send a company or product name (e.g., "XQ Pharma") to your Telegram bot. Search:** The workflow uses Decodo to Google search for "[Name] reviews" and extracts the top URL results. Scrape:** It visits the review pages and strips away the HTML code to get clean text. Analyze (Loop):** The first AI Agent reads the text and determines the sentiment (Positive/Neutral/Negative) and key topics. Report:* A second AI Agent collects all the analysis pieces and writes a final summary containing a *Sentiment Score, **Customer Voice (direct quotes), and an Actionable Verdict. Delivery:** The final report is sent back to you as a Telegram message. Customization Ideas Change the Source:** Modify the search query to target specific platforms (e.g., "site:reddit.com [Product] reviews"). Change the Output:* Send the final report to a *Slack channel* or *Email** for your team to see. Database Logging:* Save the "Actionable Verdict" and sentiment scores into *Notion* or *Airtable** to track brand reputation over time. Competitor Analysis:** Use it to research competitor products instead of your own to find their weaknesses. If you need any help Get in Touch
by Khairul Muhtadin
Stop wasting hours manually hunting for business leads. This workflow automates the entire process from scraping Google Maps to extracting contact emails all triggered from your phone via Telegram. What It Does Send a single message to your Telegram bot (Sector; Limit; MapsURL) and the system takes over. It scrapes business data from Google Maps using Apify, generates AI-powered company summaries via OpenAI, hunts for contact emails from business websites using Jina AI, then stores everything neatly in Google Sheets. Who It's For Sales reps building cold outreach lists, marketing agencies prospecting new clients, or anyone who needs targeted local business data fast — without paying for overpriced lead databases. Why It's Worth It Manual research that takes 4 hours gets done in under 5 minutes for 50 leads. Pay only for what you use (Apify + OpenAI) instead of fixed monthly subscriptions. AI deduplication keeps your CRM clean and consistent. What You'll Need | Tool | Purpose | |------|---------| | n8n | Workflow engine | | Apify | Google Maps scraper | | OpenAI API | Summaries & email extraction | | Google Sheets | Lead storage | | Telegram Bot | Mobile trigger interface | | Jina AI | Website-to-text conversion | Quick Setup Import the JSON workflow into your n8n instance Connect credentials: Telegram bot token, Apify API key, OpenAI key, Google account Set up your Sheet with the matching column headers Test with: Coffee Shops; 5; https://www.google.com/maps/search/coffee+shops+london How the Logic Works The workflow runs a two-stage loop per business. First it saves core data (name, phone, address). If a website exists, it then attempts email enrichment. This way, you never lose basic lead data even if a website crawl fails. Extend It Further Swap Google Sheets for HubSpot or Pipedrive, push results to a Slack sales channel, or chain a Gmail node to auto-send intro emails the moment a lead is found. Created by: Khaisa Studio Category: Marketing | Tags: Lead Gen, AI, Google Maps, Telegram Need custom workflows? Contact us Connect with the creator: Portfolio • Workflows • LinkedIn • Medium • Threads
by Kev
⚠️ Important: This workflow uses community nodes (JsonCut, Blotato) and requires a self-hosted n8n instance. This n8n template automates the entire process of transforming blog articles and any kind of other websites into short-form informational videos for Instagram. It scrapes content, generates AI-powered video clips, adds voiceover and subtitles, and publishes directly to social media—all with proper source attribution and branding. Who's it for Content creators, digital marketers, and social media managers who want to repurpose quality blog content into engaging video formats. Perfect for those running content marketing operations who need to maintain consistent social media presence without manual video editing. What it does The workflow takes a blog article URL as input and produces a fully composed Instagram-ready video with: AI-generated background video clips matching the content Professional text-to-speech narration Auto-generated subtitles with word-by-word animations Background music from Creative Commons sources Branding overlay and source attribution Smooth transitions between scenes Direct publishing to Instagram How it works Content Extraction: Firecrawl scrapes the blog article and extracts clean markdown content Content Summarization: An LLM via OpenRouter condenses the article into digestible talking points (max 1,000 characters) Script Generation: A second LLM generates 3-5 video prompts, narration text, and social media caption in structured JSON format Video Generation: Google Veo API creates 8-second background clips in 9:16 format for each prompt Audio Creation: OpenAI TTS converts the narration to speech, while Openverse API fetches royalty-free background music File Upload: All assets (videos, voice, music) are uploaded to JsonCut's storage Video Composition: JsonCut merges everything together with auto-subtitles, transitions, branding overlays, and source attribution Publishing: Blotato uploads the final video to Instagram as a reel with the generated caption Setup requirements Required accounts and credentials: Firecrawl API** - for web scraping OpenRouter API** - for LLM access (uses GPT-4 Mini in this template) Google Gemini API** - for Veo video generation (note: 10 requests/day free tier limit) OpenAI** - for text-to-speech generation JsonCut account** - for video composition and file hosting Blotato account** - for Instagram publishing Instagram Business account** connected to Blotato Installation steps: Install community nodes: @mendable/n8n-nodes-firecrawl n8n-nodes-jsoncut @blotato/n8n-nodes-blotato Configure all API credentials in n8n's credential manager Update the Blotato Instagram account ID in the "Create Instagram post" node Replace the branding overlay image URL in the JsonCut "Generate media" node config: "path": "https://your-logo-url.png" Test with the chat trigger by entering a blog article URL Good to know Cost considerations: Blotato costs $29 (there are many cheaper alternatives available) JsonCut is free (but the Pro subscription is required for the auto caption feature) Veo 3 fast costs approximately $0.15 per second Rate limits: Google Veo free tier is limited to 10 requests per day, which means ~2-3 complete workflows daily Processing time: Full workflow takes 5-10 minutes depending on Veo API response times Source attribution: The workflow automatically extracts the domain from the input URL and displays it on the first video clip Video quality: Output depends heavily on input quality. The workflow is designed for repurposing legitimate content
by Samyotech
Social Media Posting Automation with Image and Caption How it works This AI-powered workflow streamlines your social media posting process, transforming hours of manual caption writing, image uploading, and scheduling into a fully automated system. You define the topic and image once, and the workflow handles caption generation, review, approval, and posting to your selected platforms. Automated Flow Generate Caption Trigger the workflow manually and set your post topic and image URL in the Set node. The AI (GPT-4.1-mini) generates a high-quality, engaging social media caption tailored to your audience, platform, and content goals. Store in Google Sheet The generated caption, along with your image URL and post metadata, is automatically appended to your Google Sheet. This creates a central location to review and manage all your social media content. Review and Approve You review the generated caption in the sheet, make any edits if needed, and update the status to Approved. You can also select the platform(s) where you want to post. Automatic Posting Once the status is updated to Approved, the next workflow is triggered automatically. It posts your caption and image to the selected social media platform(s) without any further manual effort. The result? A seamless, end-to-end social media posting process where captions are AI-generated, stored, reviewed, and posted automatically. Focus on strategy and engagement instead of repetitive manual posting. Setup Steps Setup time: ~10–15 minutes What you’ll need: OpenAI API key, Google account, access to your social media platform(s) Connect Your Google Account Click on the Google Sheets node in your workflow. Select the Credential dropdown and choose + Create New Credential. Authenticate your Google account and grant the necessary permissions. Initialize Your Spreadsheet Run the workflow once by clicking the play button on the start node. This will automatically create a Google Sheet with all the required columns for caption tracking and approval. Add Your OpenAI API Key Navigate to the AI Agent node. Click the Credential dropdown and select + Create New Credential. Paste your OpenAI API key and save. Get your API key from platform.openai.com/api-keys. Set Post Topic and Image Update the title in the Set node with the topic you want to post. Add the image URL associated with your post. Review Captions and Approve Open your Google Sheet, review the generated captions, and update the status to Approved. Select the platform(s) where the post should go live. Go Live Once the status is updated, the workflow will automatically post your content to the selected social media platform(s). Sit back and watch your AI-generated captions and images go live automatically. Ready to automate your social media? Activate your workflow and start posting smarter today! 🚀💡✅
by inderjeet Bhambra
This workflow contains community nodes that are only compatible with the self-hosted version of n8n. How it works? The Content Strategy AI Pipeline is an intelligent, multi-stage content creation system that transforms simple user prompts into polished, ready-to-publish content. The system intelligently extracts platform requirements, audience insights, and brand tone from user requests, then develops strategic reasoning and emotional connection strategies before crafting compelling content outlines and final publication-ready posts or articles. Supporting both social media platforms (Instagram, LinkedIn, X, Facebook, TikTok) and blog content. Key Differentiators: Strategic thinking approach, emotional intelligence integration, platform-native optimization, zero-editing-required output, and professional content strategist-level quality through multi-model AI orchestration. Technical Points Multi-model AI orchestration for specialized tasks Emotional psychology integration for audience connection Platform algorithm optimization built-in Industry-standard content strategy methodology automated Enterprise-grade reliability with session management and memory API-ready architecture for integration into existing workflows Test Inputs Sample Request: "Create an Instagram post for a fitness coach targeting busy moms, tone should be motivational and relatable" Expected Flow: Platform: Instagram → Niche: Fitness → Audience: Busy Moms → Tone: Motivational → Output: 125-150 word post with hashtags `
by vinci-king-01
Product Price Monitor with Mailgun and MongoDB ⚠️ COMMUNITY TEMPLATE DISCLAIMER: This is a community-contributed template that uses ScrapeGraphAI (a community node). Please ensure you have the ScrapeGraphAI community node installed in your n8n instance before using this template. This workflow automatically scrapes multiple e-commerce sites, records weekly product prices in MongoDB, analyzes seasonal trends, and emails a concise report to retail stakeholders via Mailgun. It helps retailers make informed inventory and pricing decisions by providing up-to-date pricing intelligence. Pre-conditions/Requirements Prerequisites n8n instance (self-hosted, desktop, or n8n.cloud) ScrapeGraphAI community node installed and activated MongoDB database (Atlas or self-hosted) Mailgun account with a verified domain Publicly reachable n8n Webhook URL (if self-hosted) Required Credentials ScrapeGraphAI API Key** – Enables web scraping across target sites MongoDB Credentials** – Connection string (MongoDB URI) with read/write access Mailgun API Key & Domain** – To send summary emails MongoDB Collection Schema | Field | Type | Example Value | Notes | |-----------------|----------|---------------------------|---------------------------------------------| | productId | String | SKU-12345 | Unique identifier you define | | productName | String | Women's Winter Jacket | Human-readable name | | timestamp | Date | 2024-09-15T00:00:00Z | Ingest date (automatically added) | | price | Number | 79.99 | Scraped price | | source | String | example-shop.com | Domain where price was scraped | How it works This workflow automatically scrapes multiple e-commerce sites, records weekly product prices in MongoDB, analyzes seasonal trends, and emails a concise report to retail stakeholders via Mailgun. It helps retailers make informed inventory and pricing decisions by providing up-to-date pricing intelligence. Key Steps: Webhook Trigger**: Starts the workflow on a scheduled HTTP call or manual trigger. Code (Prepare Products)**: Defines the list of SKUs/URLs to monitor. Split In Batches**: Processes products in manageable chunks to respect rate limits. ScrapeGraphAI (Scrape Price)**: Extracts price, availability, and currency from each product URL. Merge (Combine Results)**: Re-assembles all batch outputs into one dataset. MongoDB (Upsert Price History)**: Stores each price point for historical analysis. If (Seasonal Trend Check)**: Compares current price against historical average to detect anomalies. Set (Email Payload)**: Formats the trend report for email. Mailgun (Send Email)**: Emails weekly summary to specified recipients. Respond to Webhook**: Returns “200 OK – Report Sent” response for logging. Set up steps Setup Time: 15-20 minutes Install Community Node In n8n, go to “Settings → Community Nodes” and install @n8n-community/nodes-scrapegraphai. Create Credentials Add ScrapeGraphAI API key under Credentials. Add MongoDB credentials (type: MongoDB). Add Mailgun credentials (type: Mailgun). Import Workflow Download the JSON template, then in n8n click “Import” and select the file. Configure Product List Open the Code (Prepare Products) node and replace the example array with your product objects { id, name, url }. Adjust Cron/Schedule If you prefer a fully automated schedule, replace the Webhook with a Cron node (e.g., every Monday at 09:00). Verify MongoDB Collection Ensure the collection (default: productPrices) exists or let n8n create it on first run. Set Recipients In the Mailgun node, update the to, from, and subject fields. Execute Test Run Manually trigger the Webhook URL or run the workflow once to verify data flow and email delivery. Activate Toggle the workflow to “Active” so it runs automatically each week. Node Descriptions Core Workflow Nodes: Webhook** – Entry point that accepts a GET/POST call to start the job. Code (Prepare Products)** – Outputs an array of products to monitor. Split In Batches** – Limits scraping to N products per request to avoid banning. ScrapeGraphAI** – Scrapes the HTML of a product page and parses pricing data. Merge** – Re-combines batch results for streamlined processing. MongoDB** – Inserts or updates each product’s price history document. If** – Determines whether price deviates > X% from the season average. Set** – Builds an HTML/text email body containing the findings. Mailgun** – Sends the email via Mailgun REST API. Respond to Webhook** – Returns an HTTP response for logging/monitoring. Sticky Notes** – Provide in-workflow documentation (no execution). Data Flow: Webhook → Code → Split In Batches Split In Batches → ScrapeGraphAI → Merge Merge → MongoDB → If If (true) → Set → Mailgun → Respond to Webhook Customization Examples Change Scraping Frequency (Cron) // Cron node settings { "mode": "custom", "cronExpression": "0 6 * * 1,4" // Monday & Thursday 06:00 } Extend Data Points (Reviews Count, Stock) // In ScrapeGraphAI extraction config { "price": "css:span.price", "inStock": "css:div.availability", "reviewCount": "regex:\"(\\d+) reviews\"" } Data Output Format The workflow outputs structured JSON data: { "productId": "SKU-12345", "productName": "Women's Winter Jacket", "timestamp": "2024-09-15T00:00:00Z", "price": 79.99, "currency": "USD", "source": "example-shop.com", "trend": "5% below 3-month average" } Troubleshooting Common Issues ScrapeGraphAI returns empty data – Confirm selectors/XPath are correct; test with ScrapeGraphAI playground. MongoDB connection fails – Verify IP-whitelisting for Atlas or network connectivity for self-hosted instance. Mail not delivered – Check Mailgun logs for bounce or spam rejection, and ensure from domain is verified. Performance Tips Use smaller batch sizes (e.g., 5 URLs) to avoid target site rate-limit blocks. Cache static product info; scrape only fields that change (price, stock). Pro Tips: Integrate the IF node with n8n’s Slack node to push urgent price drops to a channel. Add a Function node to calculate moving averages for deeper analysis. Store raw HTML snapshots in S3/MinIO for auditability and debugging.
by Meak
LinkedIn Job-Based Cold Email System Most outreach tools rely on generic lead lists and recycled contact data. This workflow builds a live, personalized lead engine that scrapes new LinkedIn job posts, finds company decision-maker emails, and generates custom cold emails using GPT — all fully automated through n8n. Benefits Automated daily scraping of “Marketing Manager” jobs in Belgium Real-time leads from companies currently hiring for marketing roles Filters out HR and staffing agencies to keep only real businesses Enriches each company with verified CEO, Sales, and Marketing emails Generates unique, human-like cold emails and subject lines with GPT-4o Saves clean data to Google Sheets and drafts personalized Gmail messages How It Works Schedule Trigger runs every morning at 08:00. Apify LinkedIn Scraper collects new “Marketing Manager” jobs in Belgium. Remove Duplicates ensures each company appears only once. Filter Staffing excludes recruiters, HR agencies, and interim firms. Save Useful Infos extracts core company data — name, domain, size, description. Filter Domain & Size keeps valid websites and companies under 100 employees. Anymailfinder API looks up CEO, Sales, and Marketing decision-maker emails. Merge + If Node validates email results and removes invalid entries. Split Out + Deduplicate ensures unique, verified contacts. Extract Lead Name (Code Node) separates first and last names. Google Sheets Node appends all enriched lead data to your master sheet. GPT-4o (LangChain) writes a 100–120 word personalized cold email. GPT-4o (LangChain) creates a short, casual subject line. Gmail Draft Node builds a ready-to-send email using both outputs. Wait Node loops until all leads are processed. Who Is This For B2B agencies targeting Belgian SMEs Outbound marketers using job postings as purchase intent signals Freelancers or founders running lean, automated outreach systems Growth teams building scalable cold email engines Setup Apify**: use curious_coder~linkedin-jobs-scraper actor + API token Anymailfinder**: header auth with decision-maker categories (ceo, sales, marketing) Google Sheets**: connect a sheet named “LinkedIn Job Scraper” and map columns OpenAI (GPT-4o)**: insert your API key into both LangChain nodes Gmail**: OAuth2 connection; resource set to draft n8n**: store all credentials securely; set HTTP nodes to continue on error ROI & Results Save 1–3 hours per day on manual research and outreach prep Contact active hiring companies when they need marketing help most Scale to multiple industries or regions by changing search URLs Outperform paid lead databases with fresh, verified data Strategy Insights Add funding or tech-stack data for better lead scoring A/B test GPT subject lines and log open rates in Sheets Schedule GPT follow-ups 3 and 7 days later for full automation Push all enriched data to your CRM for advanced segmentation Use hiring signals to trigger ad audiences or retargeting campaigns Check Out My Channel For more advanced automation workflows that generate real client results, check out my YouTube channel — where I share the exact systems I use to automate outreach, scale agency pipelines, and close deals faster.
by Onur
Lead Sourcing by Job Posts For Outreach With Scrape.do API & Open AI & Google Sheets Overview This n8n workflow automates the complete lead generation process by scraping job postings from Indeed, enriching company data via Apollo.io, identifying decision-makers, and generating personalized LinkedIn outreach messages using OpenAI. It integrates with Scrape.do for reliable web scraping, Apollo.io for B2B data enrichment, OpenAI for AI-powered personalization, and Google Sheets for centralized data storage. Perfect for: Sales teams, recruiters, business development professionals, and marketing agencies looking to automate their outbound prospecting pipeline. Workflow Components 1. ⏰ Schedule Trigger | Property | Value | |----------|-------| | Type | Schedule Trigger | | Purpose | Automatically initiates workflow on a recurring schedule | | Frequency | Weekly (Every Monday) | | Time | 00:00 UTC | Function: Ensures consistent, hands-off lead generation by running the pipeline automatically without manual intervention. 2. 🔍 Scrape.do Indeed API | Property | Value | |----------|-------| | Type | HTTP Request (GET) | | Purpose | Scrapes job listings from Indeed via Scrape.do proxy API | | Endpoint | https://api.scrape.do | | Output Format | Markdown | Request Parameters: | Parameter | Value | Description | |-----------|-------|-------------| | token | API Token | Scrape.do authentication | | url | Indeed Search URL | Target job search page | | super | true | Uses residential proxies | | geoCode | us | US-based content | | render | true | JavaScript rendering enabled | | device | mobile | Mobile viewport for cleaner HTML | | output | markdown | Lightweight text output | Function: Fetches Indeed job listings with anti-bot bypass, returning clean markdown for easy parsing. 3. 📋 Parse Indeed Jobs | Property | Value | |----------|-------| | Type | Code Node (JavaScript) | | Purpose | Extracts structured job data from markdown | | Mode | Run once for all items | Extracted Fields: | Field | Description | Example | |-------|-------------|---------| | jobTitle | Position title | "Senior Data Engineer" | | jobUrl | Indeed job link | "https://indeed.com/viewjob?jk=abc123" | | jobId | Indeed job identifier | "abc123" | | companyName | Hiring company | "Acme Corporation" | | location | City, State | "San Francisco, CA" | | salary | Pay range | "$120,000 - $150,000" | | jobType | Employment type | "Full-time" | | source | Data source | "Indeed" | | dateFound | Scrape date | "2025-01-15" | Function: Parses markdown using regex patterns, filters invalid entries, and deduplicates by company name. 4. 📊 Add New Company (Google Sheets) | Property | Value | |----------|-------| | Type | Google Sheets Node | | Purpose | Stores parsed job postings for tracking | | Operation | Append rows | | Target Sheet | "Add New Company" | Function: Creates a historical record of all discovered job postings and companies for pipeline tracking. 5. 🏢 Apollo Organization Search | Property | Value | |----------|-------| | Type | HTTP Request (POST) | | Purpose | Enriches company data via Apollo.io API | | Endpoint | https://api.apollo.io/v1/organizations/search | | Authentication | HTTP Header Auth (x-api-key) | Request Body: { "q_organization_name": "Company Name", "page": 1, "per_page": 1 } Response Fields: | Field | Description | |-------|-------------| | id | Apollo organization ID | | name | Official company name | | website_url | Company website | | linkedin_url | LinkedIn company page | | industry | Business sector | | estimated_num_employees | Company size | | founded_year | Year established | | city, state, country | Location details | | short_description | Company overview | Function: Retrieves comprehensive company intelligence including LinkedIn profiles, industry classification, and employee count. 6. 📤 Extract Apollo Org Data | Property | Value | |----------|-------| | Type | Code Node (JavaScript) | | Purpose | Parses Apollo response and merges with original data | | Mode | Run once for each item | Function: Extracts relevant fields from Apollo API response and combines with job posting data for downstream processing. 7. 👥 Apollo People Search | Property | Value | |----------|-------| | Type | HTTP Request (POST) | | Purpose | Finds decision-makers at target companies | | Endpoint | https://api.apollo.io/v1/mixed_people/search | | Authentication | HTTP Header Auth (x-api-key) | Request Body: { "organization_ids": ["apollo_org_id"], "person_titles": [ "CTO", "Chief Technology Officer", "VP Engineering", "Head of Engineering", "Engineering Manager", "Technical Director", "CEO", "Founder" ], "page": 1, "per_page": 3 } Response Fields: | Field | Description | |-------|-------------| | first_name | Contact first name | | last_name | Contact last name | | title | Job title | | email | Email address | | linkedin_url | LinkedIn profile URL | | phone_number | Direct phone | Function: Identifies key stakeholders and decision-makers based on configurable title filters. 8. 📝 Format Leads | Property | Value | |----------|-------| | Type | Code Node (JavaScript) | | Purpose | Structures lead data for outreach | | Mode | Run once for all items | Function: Combines person data with company context, creating comprehensive lead profiles ready for personalization. 9. 🤖 Generate Personalized Message (OpenAI) | Property | Value | |----------|-------| | Type | OpenAI Node | | Purpose | Creates custom LinkedIn connection messages | | Model | gpt-4o-mini | | Max Tokens | 150 | | Temperature | 0.7 | System Prompt: You are a professional outreach specialist. Write personalized LinkedIn connection request messages. Keep messages under 300 characters. Be friendly, professional, and mention a specific reason for connecting based on their role and company. User Prompt Variables: | Variable | Source | |----------|--------| | Name | $json.fullName | | Title | $json.title | | Company | $json.companyName | | Industry | $json.industry | | Job Context | $json.jobTitle | Function: Generates unique, contextual outreach messages that reference specific hiring activity and company details. 10. 🔗 Merge Lead + Message | Property | Value | |----------|-------| | Type | Code Node (JavaScript) | | Purpose | Combines lead data with generated message | | Mode | Run once for each item | Function: Merges OpenAI response with lead profile, creating the final enriched record. 11. 💾 Save Leads to Sheet | Property | Value | |----------|-------| | Type | Google Sheets Node | | Purpose | Stores final lead data with personalized messages | | Operation | Append rows | | Target Sheet | "Leads" | Data Mapping: | Column | Data | |--------|------| | First Name | Lead's first name | | Last Name | Lead's last name | | Title | Job title | | Company | Company name | | LinkedIn URL | Profile link | | Country | Location | | Industry | Business sector | | Date Added | Timestamp | | Source | "Indeed + Apollo" | | Personalized Message | AI-generated outreach text | Function: Creates actionable lead database ready for outreach campaigns. Workflow Flow ⏰ Schedule Trigger │ ▼ 🔍 Scrape.do Indeed API ──► Fetches job listings with JS rendering │ ▼ 📋 Parse Indeed Jobs ──► Extracts company names, job details │ ▼ 📊 Add New Company ──► Saves to Google Sheets (Companies) │ ▼ 🏢 Apollo Org Search ──► Enriches company data │ ▼ 📤 Extract Apollo Org Data ──► Parses API response │ ▼ 👥 Apollo People Search ──► Finds decision-makers │ ▼ 📝 Format Leads ──► Structures lead profiles │ ▼ 🤖 Generate Personalized Message ──► AI creates custom outreach │ ▼ 🔗 Merge Lead + Message ──► Combines all data │ ▼ 💾 Save Leads to Sheet ──► Final storage (Leads) Configuration Requirements API Keys & Credentials | Credential | Purpose | Where to Get | |------------|---------|--------------| | Scrape.do API Token | Web scraping with anti-bot bypass | scrape.do/dashboard | | Apollo.io API Key | B2B data enrichment | apollo.io/settings/integrations | | OpenAI API Key | AI message generation | platform.openai.com | | Google Sheets OAuth2 | Data storage | n8n Credentials Setup | n8n Credential Setup | Credential Type | Configuration | |-----------------|---------------| | HTTP Header Auth (Apollo) | Header: x-api-key, Value: Your Apollo API key | | OpenAI API | API Key: Your OpenAI API key | | Google Sheets OAuth2 | Complete OAuth flow with Google | Key Features 🔍 Intelligent Job Scraping Anti-Bot Bypass:** Residential proxy rotation via Scrape.do JavaScript Rendering:** Full headless browser for dynamic content Mobile Optimization:** Cleaner HTML with mobile viewport Markdown Output:** Lightweight, easy-to-parse format 🏢 B2B Data Enrichment Company Intelligence:** Industry, size, location, LinkedIn Decision-Maker Discovery:** Title-based filtering Contact Information:** Email, phone, LinkedIn profiles Real-Time Data:** Fresh information from Apollo.io 🤖 AI-Powered Personalization Contextual Messages:** References specific hiring activity Character Limit:** Optimized for LinkedIn (300 chars) Variable Temperature:** Balanced creativity and consistency Role-Specific:** Tailored to recipient's title and company 📊 Automated Data Management Dual Sheet Storage:** Companies + Leads separation Timestamp Tracking:** Historical records Deduplication:** Prevents duplicate entries Ready for Export:** CSV-compatible format Use Cases 🎯 Sales Prospecting Identify companies actively hiring in your target market Find decision-makers at companies investing in growth Generate personalized cold outreach at scale Track pipeline from discovery to contact 👥 Recruiting & Talent Acquisition Monitor competitor hiring patterns Identify companies building specific teams Connect with hiring managers directly Build talent pipeline relationships 📈 Market Intelligence Track industry hiring trends Monitor competitor expansion signals Identify emerging market opportunities Benchmark salary ranges by role 🤝 Partnership Development Find companies investing in complementary areas Identify potential integration partners Connect with technical leadership Build strategic relationship pipeline Technical Notes | Specification | Value | |---------------|-------| | Processing Time | 2-5 minutes per run (depending on job count) | | Jobs per Run | ~25 unique companies | | API Calls per Run | 1 Scrape.do + 25 Apollo Org + 25 Apollo People + ~75 OpenAI | | Data Accuracy | 90%+ for company matching | | Success Rate | 99%+ with proper error handling | Rate Limits to Consider | Service | Free Tier Limit | Recommendation | |---------|-----------------|----------------| | Scrape.do | 1,000 credits/month | ~40 runs/month | | Apollo.io | 100 requests/day | Add Wait nodes if needed | | OpenAI | Based on usage | Monitor costs (~$0.01-0.05/run) | | Google Sheets | 300 requests/minute | No issues expected | Setup Instructions Step 1: Import Workflow Copy the JSON workflow configuration In n8n: Workflows → Import from JSON Paste configuration and save Step 2: Configure Scrape.do Sign up at scrape.do Navigate to Dashboard → API Token Copy your token Token is embedded in URL query parameter (already configured) To customize search: Change the url parameter in "Scrape.do Indeed API" node: q=data+engineer (search term) l=Remote (location) fromage=7 (last 7 days) Step 3: Configure Apollo.io Sign up at apollo.io Go to Settings → Integrations → API Keys Create new API key In n8n: Credentials → Add Credential → Header Auth Name: x-api-key Value: Your Apollo API key Select this credential in both Apollo HTTP nodes Step 4: Configure OpenAI Go to platform.openai.com Create new API key In n8n: Credentials → Add Credential → OpenAI Paste API key Select credential in "Generate Personalized Message" node Step 5: Configure Google Sheets Create new Google Spreadsheet Create two sheets: Sheet 1: "Add New Company" Columns: companyName | jobTitle | jobUrl | location | salary | source | postedDate Sheet 2: "Leads" Columns: First Name | Last Name | Title | Company | LinkedIn URL | Country | Industry | Date Added | Source | Personalized Message Copy Sheet ID from URL In n8n: Credentials → Add Credential → Google Sheets OAuth2 Update both Google Sheets nodes with your Sheet ID Step 6: Test and Activate Manual Test: Click "Execute Workflow" button Verify Each Node: Check outputs step by step Review Data: Confirm data appears in Google Sheets Activate: Toggle workflow to "Active" Error Handling Common Issues | Issue | Cause | Solution | |-------|-------|----------| | "Invalid character: " | Empty/malformed company name | Check Parse Indeed Jobs output | | "Node does not have credentials" | Credential not linked | Open node → Select credential | | Empty Parse Results | Indeed HTML structure changed | Check Scrape.do raw output | | Apollo Rate Limit (429) | Too many requests | Add 5-10s Wait node between calls | | OpenAI Timeout | Too many tokens | Reduce batch size or max_tokens | | "Your request is invalid" | Malformed JSON body | Verify expression syntax in HTTP nodes | Troubleshooting Steps Verify Credentials: Test each credential individually Check Node Outputs: Use "Execute Node" for debugging Monitor API Usage: Check Apollo and OpenAI dashboards Review Logs: Check n8n execution history for details Test with Sample: Use known company name to verify Apollo Recommended Error Handling Additions For production use, consider adding: IF node after Apollo Org Search to handle empty results Error Workflow trigger for notifications Wait nodes between API calls for rate limiting Retry logic for transient failures Performance Specifications | Metric | Value | |--------|-------| | Execution Time | 2-5 minutes per scheduled run | | Jobs Discovered | ~25 per Indeed page | | Leads Generated | 1-3 per company (based on title matches) | | Message Quality | Professional, contextual, <300 chars | | Data Freshness | Real-time from Indeed + Apollo | | Storage Format | Google Sheets (unlimited rows) | API Reference Scrape.do API | Endpoint | Method | Purpose | |----------|--------|---------| | https://api.scrape.do | GET | Direct URL scraping | Documentation: [scrape.do/documentation Apollo.io API | Endpoint | Method | Purpose | |----------|--------|---------| | /v1/organizations/search | POST | Company lookup | | /v1/mixed_people/search | POST | People search | Documentation: apolloio.github.io/apollo-api-docs OpenAI API | Endpoint | Method | Purpose | |----------|--------|---------| | /v1/chat/completions | POST | Message generation | Documentation: [platform.openai.com
by Cheng Siong Chin
Introduction Automates flight deal discovery and intelligent analysis for travel bloggers and deal hunters. Scrapes live pricing, enriches with weather data, applies AI evaluation, and auto-publishes to WordPress—eliminating manual research and accelerating content delivery. How It Works User submits route via form, scrapes real-time flight prices and weather data, AI analyzes deal quality considering weather conditions, formats results, publishes to WordPress, sends Slack notification—fully automated from input to publication. Workflow Template Form Input → Extract Data → Scrape Flight Prices → Extract Pricing → Fetch Weather → Parse Weather → Prepare AI Input → AI Analysis → Parse Output → Format Results → Publish WordPress → Slack Alert → User Response Setup Instructions Form Setup: Configure user input fields for flight routes and preferences APIs: Connect Google Flights scraping endpoint, weather API credentials, OpenAI/Chat Model API key Publishing: Set WordPress credentials, target blog category, Slack webhook URL AI Configuration: Define analysis prompts, output structure, parser rules Workflow Steps Data Collection: Form captures route, scrapes Google Flights pricing, fetches destination weather via API AI Processing: Enriches flight data with weather context, analyzes deal quality using OpenAI/Chat Model with structured output parsing Publishing: Formats analysis results, creates WordPress post, sends Slack notification, delivers response to user Prerequisites n8n instance, Google Flights access, weather API key, OpenAI/compatible AI service, WordPress site with API access, Slack workspace Use Cases Travel blog automation, flight deal newsletters, price comparison services, seasonal travel planning, destination weather analysis, automated social media content Customization Modify AI analysis criteria, adjust weather impact weighting, customize WordPress post templates, add email distribution, integrate additional data sources, expand to hotel/rental deals Benefits Eliminates manual price checking, combines multiple data sources automatically, delivers AI-enhanced insights, accelerates publishing workflow, scales across unlimited routes, provides weather-aware recommendations