by Kev
⚠️ Important: This workflow uses community nodes (JsonCut, Blotato) and requires a self-hosted n8n instance. This n8n template automates the entire process of transforming blog articles and any kind of other websites into short-form informational videos for Instagram. It scrapes content, generates AI-powered video clips, adds voiceover and subtitles, and publishes directly to social media—all with proper source attribution and branding. Who's it for Content creators, digital marketers, and social media managers who want to repurpose quality blog content into engaging video formats. Perfect for those running content marketing operations who need to maintain consistent social media presence without manual video editing. What it does The workflow takes a blog article URL as input and produces a fully composed Instagram-ready video with: AI-generated background video clips matching the content Professional text-to-speech narration Auto-generated subtitles with word-by-word animations Background music from Creative Commons sources Branding overlay and source attribution Smooth transitions between scenes Direct publishing to Instagram How it works Content Extraction: Firecrawl scrapes the blog article and extracts clean markdown content Content Summarization: An LLM via OpenRouter condenses the article into digestible talking points (max 1,000 characters) Script Generation: A second LLM generates 3-5 video prompts, narration text, and social media caption in structured JSON format Video Generation: Google Veo API creates 8-second background clips in 9:16 format for each prompt Audio Creation: OpenAI TTS converts the narration to speech, while Openverse API fetches royalty-free background music File Upload: All assets (videos, voice, music) are uploaded to JsonCut's storage Video Composition: JsonCut merges everything together with auto-subtitles, transitions, branding overlays, and source attribution Publishing: Blotato uploads the final video to Instagram as a reel with the generated caption Setup requirements Required accounts and credentials: Firecrawl API** - for web scraping OpenRouter API** - for LLM access (uses GPT-4 Mini in this template) Google Gemini API** - for Veo video generation (note: 10 requests/day free tier limit) OpenAI** - for text-to-speech generation JsonCut account** - for video composition and file hosting Blotato account** - for Instagram publishing Instagram Business account** connected to Blotato Installation steps: Install community nodes: @mendable/n8n-nodes-firecrawl n8n-nodes-jsoncut @blotato/n8n-nodes-blotato Configure all API credentials in n8n's credential manager Update the Blotato Instagram account ID in the "Create Instagram post" node Replace the branding overlay image URL in the JsonCut "Generate media" node config: "path": "https://your-logo-url.png" Test with the chat trigger by entering a blog article URL Good to know Cost considerations: Blotato costs $29 (there are many cheaper alternatives available) JsonCut is free (but the Pro subscription is required for the auto caption feature) Veo 3 fast costs approximately $0.15 per second Rate limits: Google Veo free tier is limited to 10 requests per day, which means ~2-3 complete workflows daily Processing time: Full workflow takes 5-10 minutes depending on Veo API response times Source attribution: The workflow automatically extracts the domain from the input URL and displays it on the first video clip Video quality: Output depends heavily on input quality. The workflow is designed for repurposing legitimate content
by Omer Fayyaz
An intelligent web scraping workflow that automatically routes URLs to site-specific extraction logic, normalizes data across multiple sources, and filters content by freshness to build a unified article feed. What Makes This Different: Intelligent Source Routing** - Uses a Switch node to route URLs to specialized extractors based on source identifier, enabling custom CSS selectors per publisher for maximum accuracy Universal Fallback Parser** - Advanced regex-based extractor handles unknown sources automatically, extracting title, description, author, date, and images from meta tags and HTML patterns Freshness Filtering** - Built-in 45-day freshness threshold filters outdated content before saving, with configurable date validation logic Tier-Based Classification** - Automatically categorizes articles into Tier 1 (0-7 days), Tier 2 (8-14 days), Tier 3 (15-30 days), or Archive based on publication date Rate Limiting & Error Handling** - Built-in 3-second delays between requests prevents server overload, with comprehensive error handling that continues processing even if individual URLs fail Status Tracking** - Updates source spreadsheet with processing status, enabling easy monitoring and retry logic for failed extractions Key Benefits of Multi-Source Content Aggregation: Scalable Architecture** - Easily add new sources by adding a Switch rule and extraction node, no code changes needed for most sites Data Normalization** - Standardizes extracted data across all sources into a consistent format (title, description, author, date, image, canonical URL) Automated Processing** - Schedule-based execution (every 4 hours) or manual triggers keep your feed updated without manual intervention Quality Control** - Freshness filtering ensures only recent, relevant content enters your feed, reducing noise from outdated articles Flexible Input** - Reads from Google Sheets, making it easy to add URLs in bulk or integrate with other systems Comprehensive Metadata** - Captures full article metadata including canonical URLs, publication dates, author information, and featured images Who's it for This template is designed for content aggregators, news monitoring services, content marketers, SEO professionals, researchers, and anyone who needs to collect and normalize articles from multiple websites. It's perfect for organizations that need to monitor competitor content, aggregate industry news, build content databases, track publication trends, or create unified article feeds without manually scraping each site or writing custom scrapers for every source. How it works / What it does This workflow creates a unified article aggregation system that reads URLs from Google Sheets, routes them to site-specific extractors, normalizes the data, filters by freshness, and saves results to a feed. The system: Reads Pending URLs - Fetches URLs with source identifiers from Google Sheets, filtering for entries with "Pending" status Processes with Rate Limiting - Loops through URLs one at a time with a 3-second delay between requests to respect server resources Fetches HTML Content - Downloads page HTML with proper browser headers (User-Agent, Accept, Accept-Language) to avoid blocking Routes by Source - Switch node directs URLs to specialized extractors (Site A, B, C, D) or universal fallback parser based on Source field Extracts Article Data - Site-specific HTML nodes use custom CSS selectors, while fallback uses regex patterns to extract title, description, author, date, image, and canonical URL Normalizes Data - Standardizes all extracted fields into consistent format, handling missing values and trimming whitespace Filters by Freshness - Validates publication dates and filters out articles older than 45 days (configurable threshold) Calculates Tier & Status - Assigns tier classification and freshness status based on article age Saves to Feed - Appends normalized articles to Article Feed sheet with all metadata Updates Status - Marks processed URLs as complete in source sheet for tracking Key Innovation: Source-Based Routing - Unlike generic scrapers that use one-size-fits-all extraction, this workflow uses intelligent routing to apply site-specific CSS selectors. This dramatically improves extraction accuracy while maintaining a universal fallback for unknown sources, making it both precise and extensible. How to set up 1. Prepare Google Sheets Create a Google Sheet with two tabs: "URLs to Process" and "Article Feed" In "URLs to Process" sheet, create columns: URL, Source, Status Add sample data: URLs in URL column, source identifiers (e.g., "Site A", "Site B") in Source column, and "Pending" in Status column In "Article Feed" sheet, the workflow will automatically create columns: Title, Description, Author, datePublished, imageUrl, canonicalUrl, source, sourceUrl, tier, freshnessStatus, extractedAt Verify your Google Sheets credentials are set up in n8n (OAuth2 recommended) 2. Configure Google Sheets Nodes Open the "Read Pending URLs" node and select your spreadsheet from the document dropdown Set sheet name to "URLs to Process" Configure the "Save to Article Feed" node: select same spreadsheet, set sheet name to "Article Feed", operation should be "Append or Update" Configure the "Update URL Status" node: same spreadsheet, "URLs to Process" sheet, operation "Update" Test connection by running the "Read Pending URLs" node manually to verify it can access your sheet 3. Customize Source Routing Open the "Source Router" (Switch node) to see current routing rules for Site A, B, C, D, and fallback To add a new source: Click "Add Rule", set condition: {{ $('Loop Over URLs').item.json.Source }} equals your source name Create a new HTML extraction node for your source with appropriate CSS selectors Connect the new extractor to the "Normalize Extracted Data" node Update the Switch node to route to your new extractor Example CSS selectors for common sites: // WordPress sites title: "h1.entry-title, .post-title" author: ".author-name, .byline a" date: "time.entry-date, time[datetime]" // Modern CMS title: "h1.article__title, article h1" author: ".article__byline a, a[rel='author']" date: "time[datetime], meta[property='article:published_time']" 4. Configure Freshness Threshold Open the "Freshness Filter (45 days)" IF node The current threshold is 45 days (configurable in the condition expression) To change threshold: Modify the expression cutoffDate.setDate(cutoffDate.getDate() - 45) to your desired number of days The filter marks articles as "Fresh" (within threshold) or routes to "Outdated" handler Test with sample URLs to verify date parsing works correctly for your sources 5. Set Up Scheduling & Test The workflow includes both Manual Trigger (for testing) and Schedule Trigger (runs every 4 hours) To customize schedule: Open "Schedule (Every 4 Hours)" node and adjust interval For initial testing: Use Manual Trigger, add 2-3 test URLs to your sheet with Status="Pending" Verify execution: Check that URLs are fetched, routed correctly, extracted, and saved to Article Feed Monitor the "Completion Summary" node output to see processing statistics Check execution logs for any errors in HTML extraction or date parsing Common issues: Missing CSS selectors (update extractor), date format mismatches (adjust date parsing), or rate limiting (increase wait time if needed) Requirements Google Sheets Account** - Active Google account with OAuth2 credentials configured in n8n for reading and writing spreadsheet data Source Spreadsheet** - Google Sheet with "URLs to Process" and "Article Feed" tabs, properly formatted with required columns n8n Instance** - Self-hosted or cloud n8n instance with access to external websites (HTTP Request node needs internet connectivity) Source Knowledge** - Understanding of target website HTML structure to configure CSS selectors for site-specific extractors (or use fallback parser for unknown sources)
by Rully Saputra
AI Job Matcher with Decodo, Gemini AI & Resume Analysis Sign up for Decodo — get better pricing here Who’s it for This workflow is built for job seekers, recruiters, founders, automation builders, and data engineers who want to automate job discovery and intelligently match job listings against resumes using AI. It’s ideal for anyone building job boards, candidate matching systems, hiring pipelines, or personal job alert automations using n8n. What this workflow does This workflow automatically scrapes job listings from SimplyHired using Decodo residential proxies, extracts structured job data with a Gemini AI agent, downloads resumes from Google Drive, extracts and summarizes resume content, and surfaces the most relevant job opportunities. The workflow stores structured results in a database and sends real-time notifications via Telegram, creating a scalable and low-maintenance AI-powered job matching pipeline. How it works A schedule trigger starts the workflow automatically Decodo fetches job search result pages from SimplyHired Job card HTML is extracted from the page A Gemini AI agent converts raw HTML into structured job data Resume PDFs are downloaded from Google Drive Resume text is extracted from PDF files A Gemini AI agent summarizes key resume highlights Job and resume data are stored in a database Matching job alerts are sent via Telegram How to set up Add your Decodo API credentials Add your Google Gemini API key Connect Google Drive for resume access Configure your Telegram bot Set up your database (Google Sheets by default) Update the job search URL with your keywords and location Requirements Self-hosted n8n instance Decodo account (community node) Google Gemini API access Google Drive access Telegram Bot token Google Sheets or another database > Note: This template uses a community node (Decodo) and is intended for self-hosted n8n only. How to customize the workflow Replace SimplyHired with another job board or aggregator Add job–resume matching or scoring logic Extend the resume summary with custom fields Swap Google Sheets for PostgreSQL, Supabase, or Airtable Route notifications to Slack, Email, or Webhooks Add pagination or multi-resume processing
by Onur
Automated B2B Lead Generation: Google Places, Scrape.do & AI Enrichment This workflow is a powerful, fully automated B2B lead generation engine. It starts by finding businesses on Google Maps based on your criteria (e.g., "dentists" in "Istanbul"), assigns a quality score to each, and then uses Scrape.do to reliably access their websites. Finally, it leverages an AI agent to extract valuable contact information like emails and social media profiles. The final, enriched data is then neatly organized and saved directly into a Google Sheet. This template is built for reliability, using Scrape.do to handle the complexities of web scraping, ensuring you can consistently gather data without getting blocked. 🚀 What does this workflow do? Automatically finds businesses using the Google Places API based on a category and location you define. Calculates a leadScore for each business based on its rating, website presence, and operational status to prioritize high-quality leads. Filters out low-quality leads** to ensure you only focus on the most promising prospects. Reliably scrapes the website of each high-quality lead using Scrape.do to bypass common blocking issues and retrieve the raw HTML. Uses an AI Agent (OpenAI) to intelligently parse the website's HTML and extract hard-to-find contact details (emails, social media links, phone numbers). Saves all enriched lead data** to a Google Sheet, creating a clean, actionable list for your sales or marketing team. Runs on a schedule**, continuously finding new leads without any manual effort. 🎯 Who is this for? Sales & Business Development Teams:** Automate prospecting and build targeted lead lists. Marketing Agencies:** Generate leads for clients in specific industries and locations. Freelancers & Consultants:** Quickly find potential clients for your services. Startups & Small Businesses:** Build a customer database without spending hours on manual research. ✨ Benefits Full Automation:** Set it up once and let it run on a schedule to continuously fill your pipeline. AI-Powered Enrichment:** Go beyond basic business info. Get actual emails and social profiles that aren't available on Google Maps. Reliable Website Access:* Leverages *Scrape.do** to handle proxies and prevent IP blocks, ensuring consistent data gathering from target websites. High-Quality Leads:** The built-in scoring and filtering system ensures you don't waste time on irrelevant or incomplete listings. Centralized Database:** All your leads are automatically organized in a single, easy-to-access Google Sheet. ⚙️ How it Works Schedule Trigger: The workflow starts automatically at your chosen interval (e.g., daily). Set Parameters: You define the business type (searchCategory) and location (locationName) in a central Set node. Find Businesses: It calls the Google Places API to get a list of businesses matching your criteria. Score & Filter: A custom Function node scores each lead. An IF node then separates high-quality leads from low-quality ones. Loop & Enrich: The workflow processes each high-quality lead one by one. It uses a scraping service (Scrape.do) to reliably fetch the lead's website content. An AI Agent (OpenAI) analyzes the website's footer to find contact and social media links. Save Data: The final, enriched lead information is appended as a new row in your Google Sheet. 📋 n8n Nodes Used Schedule Trigger Set HTTP Request (for Google Places & Scrape.do) Function If Split in Batches (Loop Over Items) HTML Langchain Agent (with OpenAI Chat Model & Structured Output Parser) Google Sheets 🔑 Prerequisites An active n8n instance. Google Cloud Project* with the *Places API** enabled. Google Places API Key**, stored in n8n's Header Auth credentials. A Scrape.do Account and API Token**. This is essential for reliably scraping websites without your n8n server's IP getting blocked. OpenAI Account & API Key** for the AI-powered data extraction. Google Account** with access to Google Sheets. Google Sheets API Credentials (OAuth2)** configured in n8n. A Google Sheet* prepared with columns to store the lead data (e.g., BusinessName, Address, Phone, Website, Email, Facebook, etc.*). 🛠️ Setup Import the workflow into your n8n instance. Configure Credentials: Create and/or select your credentials for: Google Places API: In the 2. Find Businesses (Google Places) node, select your Header Auth credential containing your API key. Scrape.do: In the 6a. Scrape Website HTML node, configure credentials for your Scrape.do account. OpenAI: In the OpenAI Chat Model node, select your OpenAI credentials. Google Sheets: In the 7. Save to Google Sheets node, select your Google Sheets OAuth2 credentials. Define Your Search: In the 1. Set Search Parameters node, update the searchCategory and locationName values to match your target market. Link Your Google Sheet: In the 7. Save to Google Sheets node, select your Spreadsheet and Sheet Name from the dropdown lists. Map the incoming data to the correct columns in your sheet. Set Your Schedule: Adjust the Schedule Trigger to run as often as you like (e.g., once a day). Activate the workflow! Your automated lead generation will begin on the next scheduled run.
by Colton Randolph
This n8n workflow automatically scrapes TechCrunch articles, filters for AI-related content using OpenAI, and delivers curated summaries to your Slack channels. Perfect for individuals or teams who need to stay current on artificial intelligence developments without manually browsing tech news sites. Who's it for AI product teams tracking industry developments and competitive moves Tech investors monitoring AI startup coverage and funding announcements Marketing teams following AI trends for content and positioning strategies Executives needing daily AI industry briefings without manual research overhead Development teams staying current on AI tools, frameworks, and breakthrough technologies How it works The workflow runs on a daily schedule, crawling a specificed amount of TechCrunch articles from the current year. Firecrawl extracts clean markdown content while bypassing anti-bot measures and handling JavaScript rendering automatically. Each article gets analyzed by an AI research assistant that determines if the content relates to artificial intelligence, machine learning, AI companies, or AI technology. Articles marked as "NOT_AI_RELATED" get filtered out automatically. For AI-relevant articles, OpenAI generates focused 3-bullet-point summaries that capture key insights. These summaries get delivered to your specified Slack channel with the original TechCrunch article title and source link for deeper reading. How to set up Configure Firecrawl: Add your Firecrawl API key to the HTTP Request node Set OpenAI credentials: Add your OpenAI API key to the AI Agent node Connect Slack: Configure your Slack webhook URL and target channel Adjust scheduling: Set your preferred trigger frequency (daily recommended) Test the workflow: Run manually to verify article extraction and Slack delivery Requirements Firecrawl account** with API access for TechCrunch web scraping OpenAI API key** for AI content analysis and summarization Slack workspace** with webhook permissions for message delivery n8n instance** (cloud or self-hosted) for workflow execution How to customize the workflow Source expansion: Modify the HTTP node URL to target additional tech publications beyond TechCrunch, or adjust the article limit and date filtering for different coverage needs. AI focus refinement: Update the OpenAI prompt to focus on specific AI verticals like generative AI, robotics, or ML infrastructure. Add company names or technology terms to the relevance filtering logic. Summary formats: Change from 3-bullet summaries to executive briefs, technical analyses, or competitive intelligence reports by modifying the OpenAI summarization prompt. Multi-channel delivery: Extend beyond Slack to email notifications, Microsoft Teams, or database storage for historical trend analysis and executive dashboards.
by Yang
🛍️ Pick Best-Value Products from Any Website Using Dumpling AI, GPT-4o, and Google Sheets Who’s it for This workflow is for eCommerce researchers, affiliate marketers, and anyone who needs to compare product listings across sites like Amazon. It’s perfect for quickly identifying top product picks based on delivery speed, free shipping, and price. What it does Just submit a product listing URL. The workflow will crawl it using Dumpling AI, take screenshots of the pages, and pass them to GPT-4o to extract up to 3 best-value picks. It analyzes screenshots visually—no HTML scraping needed. Each result includes: product name price review count free delivery date (if available) How it works 📝 Receives a URL through a web form 🧠 Uses Dumpling AI to crawl the website 📸 Takes screenshots of each product listing 🔍 GPT-4o analyzes each image to pick top products 🔧 A code node parses and flattens the output 📊 Google Sheets stores the result 📧 Sends the spreadsheet link via email Requirements Dumpling AI token** OpenAI key** (GPT-4o) Google Sheet** with columns: product name, price, reviews no., free_delivery_date > You can customize the AI prompt to extract other visual insights (e.g., ratings, specs).
by vinci-king-01
Product Price Monitor with Mailchimp and Baserow ⚠️ COMMUNITY TEMPLATE DISCLAIMER: This is a community-contributed template that uses ScrapeGraphAI (a community node). Please ensure you have the ScrapeGraphAI community node installed in your n8n instance before using this template. This workflow scrapes multiple e-commerce sites for product pricing data, stores the historical prices in Baserow, analyzes weekly trends, and emails a neatly formatted seasonal report to your Mailchimp audience. It is designed for retailers who need to stay on top of seasonal pricing patterns to make informed inventory and pricing decisions. Pre-conditions/Requirements Prerequisites Running n8n instance (self-hosted or n8n cloud) ScrapeGraphAI community node installed Mailchimp account with at least one audience list Baserow workspace with edit rights Product URLs or SKU list from target e-commerce platforms Required Credentials | Credential | Used By | Scope | |------------|---------|-------| | ScrapeGraphAI API Key | ScrapeGraphAI node | Web scraping | | Mailchimp API Key & Server Prefix | Mailchimp node | Sending emails | | Baserow API Token | Baserow node | Reading & writing records | Baserow Table Setup Create a table named price_tracker with the following fields: | Field Name | Type | Example | |------------|------|---------| | product_name | Text | “Winter Jacket” | | product_url | URL | https://example.com/winter-jacket | | current_price | Number | 59.99 | | scrape_date | DateTime | 2023-11-15T08:21:00Z | How it works This workflow scrapes multiple e-commerce sites for product pricing data, stores the historical prices in Baserow, analyzes weekly trends, and emails a neatly formatted seasonal report to your Mailchimp audience. It is designed for retailers who need to stay on top of seasonal pricing patterns to make informed inventory and pricing decisions. Key Steps: Schedule Trigger**: Fires every week (or custom CRON) to start the monitoring cycle. Code (Prepare URLs)**: Loads or constructs the list of product URLs to monitor. SplitInBatches**: Processes product URLs in manageable batches to avoid rate-limit issues. ScrapeGraphAI**: Scrapes each product page and extracts the current price and name. If (Price Found?)**: Continues only if scraping returns a valid price. Baserow**: Upserts the scraped data into the price_tracker table. Code (Trend Analysis)**: Aggregates weekly data to detect price increases, decreases, or stable trends. Set (Mail Content)**: Formats the trend summary into an HTML email body. Mailchimp**: Sends the seasonal price-trend report to the selected audience segment. Sticky Note**: Documentation node explaining business logic in-workflow. Set up steps Setup Time: 10-15 minutes Clone the template: Import the workflow JSON into your n8n instance. Install ScrapeGraphAI: n8n-nodes-scrapegraphai via the Community Nodes panel. Add credentials: a. ScrapeGraphAI API Key b. Mailchimp API Key & Server Prefix c. Baserow API Token Configure Baserow node: Point it to your price_tracker table. Edit product list: In the “Prepare URLs” Code node, replace the sample URLs with your own. Adjust schedule: Modify the Schedule Trigger CRON expression if weekly isn’t suitable. Test run: Execute the workflow manually once to verify credentials and data flow. Activate: Turn on the workflow for automatic weekly monitoring. Node Descriptions Core Workflow Nodes: Schedule Trigger** – Initiates the workflow on a weekly CRON schedule. Code (Prepare URLs)** – Generates an array of product URLs/SKUs to scrape. SplitInBatches** – Splits the array into chunks of 5 URLs to stay within request limits. ScrapeGraphAI** – Scrapes each URL, using XPath/CSS selectors to pull price & title. If (Price Found?)** – Filters out failed or empty scrape results. Baserow** – Inserts or updates the price record in the database. Code (Trend Analysis)** – Calculates week-over-week price changes and flags anomalies. Set (Mail Content)** – Creates an HTML table with product, current price, and trend arrow. Mailchimp** – Sends or schedules the email campaign. Sticky Note** – Provides inline documentation and edit hints. Data Flow: Schedule Trigger → Code (Prepare URLs) → SplitInBatches SplitInBatches → ScrapeGraphAI → If (Price Found?) → Baserow Baserow → Code (Trend Analysis) → Set (Mail Content) → Mailchimp Customization Examples Change scraping frequency // Schedule Trigger CRON for daily at 07:00 UTC 0 7 * * * Add competitor comparison column // Code (Trend Analysis) item.competitor_price_diff = item.current_price - item.competitor_price; return item; Data Output Format The workflow outputs structured JSON data: { "product_name": "Winter Jacket", "product_url": "https://example.com/winter-jacket", "current_price": 59.99, "scrape_date": "2023-11-15T08:21:00Z", "weekly_trend": "decrease" } Troubleshooting Common Issues Invalid ScrapeGraphAI key – Verify the API key and ensure your subscription is active. Mailchimp “Invalid Audience” error – Double-check the audience ID and that the API key has correct permissions. Baserow “Field mismatch” – Confirm your table fields match the names/types in the workflow. Performance Tips Limit each SplitInBatches run to ≤10 URLs to reduce scraping timeouts. Enable caching in ScrapeGraphAI to avoid repeated requests to the same URL within short intervals. Pro Tips: Use environment variables for all API keys to avoid hard-coding secrets. Add an extra If node to alert you if a product’s price drops below a target threshold. Combine with n8n’s Slack node for real-time alerts in addition to Mailchimp summaries.
by vinci-king-01
Product Price Monitor with Mailgun and MongoDB ⚠️ COMMUNITY TEMPLATE DISCLAIMER: This is a community-contributed template that uses ScrapeGraphAI (a community node). Please ensure you have the ScrapeGraphAI community node installed in your n8n instance before using this template. This workflow automatically scrapes multiple e-commerce sites, records weekly product prices in MongoDB, analyzes seasonal trends, and emails a concise report to retail stakeholders via Mailgun. It helps retailers make informed inventory and pricing decisions by providing up-to-date pricing intelligence. Pre-conditions/Requirements Prerequisites n8n instance (self-hosted, desktop, or n8n.cloud) ScrapeGraphAI community node installed and activated MongoDB database (Atlas or self-hosted) Mailgun account with a verified domain Publicly reachable n8n Webhook URL (if self-hosted) Required Credentials ScrapeGraphAI API Key** – Enables web scraping across target sites MongoDB Credentials** – Connection string (MongoDB URI) with read/write access Mailgun API Key & Domain** – To send summary emails MongoDB Collection Schema | Field | Type | Example Value | Notes | |-----------------|----------|---------------------------|---------------------------------------------| | productId | String | SKU-12345 | Unique identifier you define | | productName | String | Women's Winter Jacket | Human-readable name | | timestamp | Date | 2024-09-15T00:00:00Z | Ingest date (automatically added) | | price | Number | 79.99 | Scraped price | | source | String | example-shop.com | Domain where price was scraped | How it works This workflow automatically scrapes multiple e-commerce sites, records weekly product prices in MongoDB, analyzes seasonal trends, and emails a concise report to retail stakeholders via Mailgun. It helps retailers make informed inventory and pricing decisions by providing up-to-date pricing intelligence. Key Steps: Webhook Trigger**: Starts the workflow on a scheduled HTTP call or manual trigger. Code (Prepare Products)**: Defines the list of SKUs/URLs to monitor. Split In Batches**: Processes products in manageable chunks to respect rate limits. ScrapeGraphAI (Scrape Price)**: Extracts price, availability, and currency from each product URL. Merge (Combine Results)**: Re-assembles all batch outputs into one dataset. MongoDB (Upsert Price History)**: Stores each price point for historical analysis. If (Seasonal Trend Check)**: Compares current price against historical average to detect anomalies. Set (Email Payload)**: Formats the trend report for email. Mailgun (Send Email)**: Emails weekly summary to specified recipients. Respond to Webhook**: Returns “200 OK – Report Sent” response for logging. Set up steps Setup Time: 15-20 minutes Install Community Node In n8n, go to “Settings → Community Nodes” and install @n8n-community/nodes-scrapegraphai. Create Credentials Add ScrapeGraphAI API key under Credentials. Add MongoDB credentials (type: MongoDB). Add Mailgun credentials (type: Mailgun). Import Workflow Download the JSON template, then in n8n click “Import” and select the file. Configure Product List Open the Code (Prepare Products) node and replace the example array with your product objects { id, name, url }. Adjust Cron/Schedule If you prefer a fully automated schedule, replace the Webhook with a Cron node (e.g., every Monday at 09:00). Verify MongoDB Collection Ensure the collection (default: productPrices) exists or let n8n create it on first run. Set Recipients In the Mailgun node, update the to, from, and subject fields. Execute Test Run Manually trigger the Webhook URL or run the workflow once to verify data flow and email delivery. Activate Toggle the workflow to “Active” so it runs automatically each week. Node Descriptions Core Workflow Nodes: Webhook** – Entry point that accepts a GET/POST call to start the job. Code (Prepare Products)** – Outputs an array of products to monitor. Split In Batches** – Limits scraping to N products per request to avoid banning. ScrapeGraphAI** – Scrapes the HTML of a product page and parses pricing data. Merge** – Re-combines batch results for streamlined processing. MongoDB** – Inserts or updates each product’s price history document. If** – Determines whether price deviates > X% from the season average. Set** – Builds an HTML/text email body containing the findings. Mailgun** – Sends the email via Mailgun REST API. Respond to Webhook** – Returns an HTTP response for logging/monitoring. Sticky Notes** – Provide in-workflow documentation (no execution). Data Flow: Webhook → Code → Split In Batches Split In Batches → ScrapeGraphAI → Merge Merge → MongoDB → If If (true) → Set → Mailgun → Respond to Webhook Customization Examples Change Scraping Frequency (Cron) // Cron node settings { "mode": "custom", "cronExpression": "0 6 * * 1,4" // Monday & Thursday 06:00 } Extend Data Points (Reviews Count, Stock) // In ScrapeGraphAI extraction config { "price": "css:span.price", "inStock": "css:div.availability", "reviewCount": "regex:\"(\\d+) reviews\"" } Data Output Format The workflow outputs structured JSON data: { "productId": "SKU-12345", "productName": "Women's Winter Jacket", "timestamp": "2024-09-15T00:00:00Z", "price": 79.99, "currency": "USD", "source": "example-shop.com", "trend": "5% below 3-month average" } Troubleshooting Common Issues ScrapeGraphAI returns empty data – Confirm selectors/XPath are correct; test with ScrapeGraphAI playground. MongoDB connection fails – Verify IP-whitelisting for Atlas or network connectivity for self-hosted instance. Mail not delivered – Check Mailgun logs for bounce or spam rejection, and ensure from domain is verified. Performance Tips Use smaller batch sizes (e.g., 5 URLs) to avoid target site rate-limit blocks. Cache static product info; scrape only fields that change (price, stock). Pro Tips: Integrate the IF node with n8n’s Slack node to push urgent price drops to a channel. Add a Function node to calculate moving averages for deeper analysis. Store raw HTML snapshots in S3/MinIO for auditability and debugging.
by Mirai
Icebreaker Generator powered with ChatGPT This n8n template crawls a company website, distills the content with AI, and produces a short, personalized icebreaker you can drop straight into your cold emails or CRM. Perfect for SDRs, founders, and agencies who want “real research” at scale. Good to know Works from a Google Sheet of leads (domain + LinkedIn, etc.). Handles common scrape failures gracefully and marks the lead’s Status as Error. Uses ChatGPT to summarize pages and craft one concise, non-generic opener. Output is written back to the same Google Sheet (IceBreaker, Status). You’ll need Google credentials (for Sheets) and OpenAI credentials (for GPT). How it works Step 1 — Discover internal pages Reads a lead’s website from Google Sheets. Scrapes the home page and extracts all links. A Code node cleans the list (removes emails/anchors/social/external domains, normalizes paths, de-duplicates) and returns unique internal URLs. If the home page is unreachable or no links are found, the lead is marked Error and the workflow moves on. Step 2 — Convert pages to text Visits each collected URL and converts the response into HTML/Markdown text for analysis. You can cap depth/amount with the Limit node. Step 3 — Summarize & generate the icebreaker A GPT node produces a two-paragraph abstract for each page (JSON output). An Aggregate node merges all abstracts for the company. Another GPT node turns the merged summary into a personalized, multi-line icebreaker (spartan tone, non-obvious details). The result is written back to Google Sheets (IceBreaker = ..., Status = Done). The workflow loops to the next lead. How to use Prepare your sheet Include at least: organization_website_url, linkedin_url, and any other lead fields you track. Keep an empty IceBreaker and Status column for the workflow to fill. Connect credentials Google Sheets: use the Google account that owns the sheet and link it in the nodes. OpenAI: add your API key to the GPT nodes (“Summarize Website Page”, “Generate Multiline Icebreaker”). Run the workflow Start with the Manual Trigger (or replace with a schedule/webhook). Adjust Limit if you want fewer/more pages per company. Watch Status (Done/Error) and IceBreaker populate in your sheet. Requirements n8n instance Google Sheets account & access to the leads sheet OpenAI API key (for summarization + icebreaker generation) Customizing this workflow Tone & format: tweak the prompts (both GPT nodes) to match your brand voice and structure. Depth: change the Limit node to scan more/less pages; add simple rules to prioritize certain paths (e.g., /about, /blog/*). Fields: write additional outputs (e.g., Company Summary, Key Products, Recent News) back to new sheet columns. Lead selection: filter rows by Status = "" (or custom flags) to only process untouched leads. Error handling: expand the Error branch to retry with www./HTTP→HTTPS or to log diagnostics in a separate tab. Tips Keep icebreakers short, specific, and free of clichés—small, non-obvious details from the site convert best. Start with a small batch to validate quality, then scale up. Consider adding a rate limit if target sites throttle requests. In short: Sheet → crawl internal pages → AI abstracts → single tailored icebreaker → write back to the sheet, then repeat for the next lead. This automation can work great with our automation for automated cold emailing.
by Harsh Agrawal
Automated SEO Intelligence Platform with DataForSEO and Claude Transform any company website into a detailed SEO audit report in minutes! This workflow combines real-time web scraping, comprehensive SEO data analysis, and advanced AI reasoning to deliver client-ready reports automatically. Perfect for digital agencies scaling their audit services, freelance SEO consultants automating research, or SaaS teams analyzing competitor strategies before sales calls. The Process Discovery Phase: Input a company name and website URL to kick things off. The system begins with website content extraction. Intelligence Gathering: A dedicated scraper sub-workflow extracts all website content and converts it to structured markdown. Strategic Analysis: LLMs process the scraped content to understand the business model, target market, and competitive positioning. They generate business research insights and product strategy recommendations tailored to that specific company. Once this analysis completes, DataForSEO API then pulls technical metrics, backlink profiles, keyword rankings, and site health indicators. Report Assembly: All findings flow into a master report generator that structures the data into sections covering technical SEO, content strategy, competitive landscape, and actionable next steps. Custom branded cover and closing pages are added. Delivery: The HTML report converts to PDF format and emails directly to your recipient - no manual intervention needed. Setup Steps Add API credentials: OpenRouter (for AI), DataForSEO (for scraping/SEO data), and PDFco (for PDF generation) Configure email sending through your preferred service (Gmail, SendGrid, etc.) Optional: Upload custom first/last page PDFs for white-label branding Test with your own website first to see the magic happen! Customize It Adjust analysis depth: Modify the AI prompts to focus on specific SEO aspects (local SEO, e-commerce, B2B SaaS, etc.) Change report style: Edit the HTML template in the Sample_Code node for different formatting Add integrations: Connect to your CRM to automatically trigger reports when leads enter your pipeline Scale it up: Process multiple URLs in batch by feeding a Google Sheet of prospects What You'll Need OpenRouter account (Claude Opus 4.1 recommended for best insights) DataForSEO subscription (handles both scraping and SEO metrics) PDFco account (converts your reports to professional PDFs) Email service credentials configured in n8n Need Help? Connect with me on LinkedIn if you have any doubt
by Kevin Meneses
What this workflow does This workflow automatically monitors eBay Deals and sends Telegram alerts when relevant, high-quality deals are detected. It combines: Web scraping with Decodo** JavaScript pre-processing (no raw HTML sent to the LLM)** AI-based product classification and deal scoring** Rule-based filtering using price and score** Only valuable deals reach the final notification. How it works (overview) The workflow runs manually or on a schedule. The eBay Deals page is scraped using Decodo, which handles proxies and anti-bot protections. Decodo – Web Scraper for n8n JavaScript extracts only key product data (ID, title, price, URL, image). An AI Agent classifies each product and assigns a deal quality score (0–10). Price and score rules are applied. Matching deals are sent to Telegram. How to configure it 1. Decodo Add your Decodo API credentials to the Decodo node. Optionally change the target eBay URL. 2. AI Agent Add your LLM credentials (e.g. Google Gemini). No HTML is sent to the model — only compact, structured data. 3. Telegram Add your Telegram Bot Token. Set your chat_id in the Telegram node. Customize the alert message if needed. 4. Filtering rules Adjust price limits and minimum deal score in the IF node
by Ranjan Dailata
Who this is for This workflow is designed for: Automation engineers building AI-powered data pipelines Product managers & analysts needing structured insights from web pages Researchers & content teams extracting summaries from documentation or articles HR, compliance, and knowledge teams converting unstructured web content into structured records n8n self-hosted users leveraging advanced scraping and LLM enrichment It is ideal for anyone who wants to transform any public URL into structured data + clean summaries automatically. What problem this workflow solves Web content is often unstructured, verbose, and inconsistent, making it difficult to: Extract structured fields reliably Generate consistent summaries Reuse data across spreadsheets, dashboards, or databases Eliminate manual copy-paste and interpretation This workflow solves the problem of turning arbitrary web pages into machine-readable JSON and human-readable summaries, without custom scrapers or manual parsing logic. What this workflow does The workflow integrates Decodo, Google Gemini, and Google Sheets to perform automated extraction of structured data. Here’s how it works step-by-step: Input Setup The workflow begins when the user executes it manually or passes a valid URL. The input includes url. Profile Extraction with Decodo Accepts any valid URL as input Scrapes the page content using Decodo Uses Google Gemini to: Extract structured data in JSON format Generate a concise, factual summary Cleans and parses AI-generated JSON safely Merges structured data and summary output Stores the final result in Google Sheets for reporting or downstream automation JSON Parsing & Merging The Code Node cleans and parses the JSON output from the AI for reliable downstream use. The Merge Node combines both structured data and the AI-generated summary. Data Storage in Google Sheets The Google Sheets Node appends or updates the record, storing the structured JSON and summary into a connected spreadsheet. End Output A unified, machine-readable data in JSON + an executive-level summary suitable data analysis or downstream automation. Setup Instructions Prerequisites n8n account** with workflow editor access Decodo API credentials** - You need to register, login and obtain the Basic Authentication Token via Decodo Dashboard Google Gemini (PaLM) API access** Google Sheets OAuth credentials** Setup Steps Import the workflow into your n8n instance. Configure Credentials Add your Decodo API credentials in the Decodo node. Connect your Google Gemini (PaLM) credentials for both AI nodes. Authenticate your Google Sheets account. Edit Input Node In the Set the Input Fields node, replace the default URL with your desired profile or dynamic data source. Run the Workflow Trigger manually or via webhook integration for automation. Verify that structured profile data and summary are written to the linked Google Sheet. How to customize this workflow to your needs You can easily extend or adapt this workflow: Modify Structured Output Change the Gemini extraction prompt to match your own JSON schema Add required fields such as authors, dates, entities, or metadata Improve Summarization Adjust summary length or tone (technical, executive, simplified) Add multi-language summarization using Gemini Change Output Destination Replace Google Sheets with: Databases (Postgres, MySQL) Notion Slack / Email File storage (JSON, CSV) Add Validation or Filtering Insert IF nodes to: Reject incomplete data Detect errors or hallucinated output Trigger alerts for malformed JSON Scale the Workflow Replace manual trigger with: Webhook Scheduled trigger Batch URL processing Summary This workflow provides a powerful, generic solution for converting unstructured web pages into structured, AI-enriched datasets. By combining Decodo for scraping, Google Gemini for intelligence, and Google Sheets for persistence, it enables repeatable, scalable, and production-ready data extraction without custom scrapers or brittle parsing logic.