by BytezTech
Quick overview This workflow runs weekly and crawls your website sitemap, scrapes each page, generates page-specific FAQs with OpenAI GPT-4o, embeds the Q&A content using OpenAI text-embedding-3-small, and upserts the vectors into a Pinecone index to keep a RAG knowledge base in sync. How it works A weekly Schedule Trigger fires every Monday at midnight IST (cron: 30 18 * * 0) to start the sync pipeline automatically. The workflow fetches your XML sitemap index, parses it, and extracts all sub-sitemap URLs to discover every page on your website. All page URLs are merged, deduplicated, and filtered to remove assets, CDN files, admin paths, and third-party links — then batched in groups of 10 for efficient processing. Each page URL is scraped as raw HTML. Scripts, styles, nav, and footer tags are stripped, and clean content (title, meta description, H1–H3 headings, paragraphs, list items) is extracted up to 5,000 characters. Pages with fewer than 100 characters are skipped. The extracted page content is sent to GPT-4o with a structured prompt that generates topic-tagged FAQ pairs in JSON format (question, answer, topic, author). Each chunk gets a deterministic chunk_id based on URL + index to ensure idempotent re-runs. Each FAQ chunk is embedded using text-embedding-3-small (1536 dimensions) and upserted into Pinecone using the chunk_id as the vector ID. A 2-second wait between batches prevents API rate-limit errors. Setup Connect your OpenAI API credential — used for both GPT-4o FAQ generation and text-embedding-3-small embeddings. Select this credential in all OpenAI nodes inside the workflow. Connect your Pinecone API credential. Make sure your Pinecone index is already created with 1536 dimensions before running the workflow. Open the "Get Sitemap Index" node and replace the placeholder URL with your actual XML sitemap URL (e.g. https://yoursite.com/sitemap_index.xml). Open the "Upsert FAQ Chunks to Pinecone" node and set your Pinecone index name and namespace where FAQ vectors should be stored. Activate the workflow — it will run automatically every Monday at midnight IST, or you can trigger it manually anytime using the "Test Workflow" button. Requirements OpenAI API key (GPT-4o access + Embeddings API) Pinecone account with an index pre-created at 1536 dimensions A website with a valid XML sitemap index (e.g. sitemap_index.xml) n8n instance (cloud or self-hosted) Customization Schedule Trigger — change the cron expression to adjust sync frequency (daily, bi-weekly, etc.) Build GPT Request node — edit the system prompt to match your brand tone, company name, or FAQ format Flatten & Filter All URLs node — modify the skipList array to exclude specific paths (e.g. /blog, /admin, /careers) Loop URLs in Batches node — increase batchSize if your site has 100+ pages and your API limits allow Pinecone namespace — use different namespaces to separate FAQs by language, region, or product line Additional info This workflow uses deterministic chunk_id values (URL + FAQ index) so that every weekly re-run safely overwrites existing Pinecone vectors — no duplicates ever accumulate. It is fully compatible with any RAG-based AI chatbot that reads from Pinecone, including n8n AI Agent workflows using the Pinecone Vector Store node.
by Iela Media
Quick overview This workflow pulls Google Maps business results from SerpApi based on search queries stored in Airtable, visits each business website to extract contact emails, and adds the enriched business records back into Airtable. How it works Runs manually and searches Airtable for Google Maps scrape queries (search term and GPS coordinates). Paginates each query (start offsets) and calls SerpApi’s Google Maps Search API to fetch local business results. Cleans and deduplicates the SerpApi results, then keeps only businesses that include a website URL. Generates a prioritized list of pages to check per website (homepage plus common /contact and /about variations) and requests each page over HTTP with redirects enabled. Extracts email addresses from the returned HTML, filters out placeholders and suspicious/system emails, and stops further page checks once a valid email is found (or records a “Website Down”/no-email outcome). Merges the extracted email back into the original Google Maps business data, removes unneeded fields, and upserts the final record into Airtable. Setup Add a SerpApi credential and ensure your SerpApi plan supports Google Maps Search API requests. Add an Airtable Personal Access Token credential with access to the bases/tables used for “Google Maps Scrape Queries” (input) and “Google Maps Scraping” (output). Update the Airtable base/table IDs and the expected fields for the query table (for example, “Search Query” and “GPS Coordinates”). Review the Airtable upsert matching field (position) and adjust it if you need a different unique key to prevent unwanted overwrites.
by Kevin Meneses
What this workflow does This workflow automatically monitors eBay Deals and sends Telegram alerts when relevant, high-quality deals are detected. It combines: Web scraping with Decodo** JavaScript pre-processing (no raw HTML sent to the LLM)** AI-based product classification and deal scoring** Rule-based filtering using price and score** Only valuable deals reach the final notification. How it works (overview) The workflow runs manually or on a schedule. The eBay Deals page is scraped using Decodo, which handles proxies and anti-bot protections. Decodo – Web Scraper for n8n JavaScript extracts only key product data (ID, title, price, URL, image). An AI Agent classifies each product and assigns a deal quality score (0–10). Price and score rules are applied. Matching deals are sent to Telegram. How to configure it 1. Decodo Add your Decodo API credentials to the Decodo node. Optionally change the target eBay URL. 2. AI Agent Add your LLM credentials (e.g. Google Gemini). No HTML is sent to the model — only compact, structured data. 3. Telegram Add your Telegram Bot Token. Set your chat_id in the Telegram node. Customize the alert message if needed. 4. Filtering rules Adjust price limits and minimum deal score in the IF node
by vinci-king-01
Product Price Monitor with Mailchimp and Baserow ⚠️ COMMUNITY TEMPLATE DISCLAIMER: This is a community-contributed template that uses ScrapeGraphAI (a community node). Please ensure you have the ScrapeGraphAI community node installed in your n8n instance before using this template. This workflow scrapes multiple e-commerce sites for product pricing data, stores the historical prices in Baserow, analyzes weekly trends, and emails a neatly formatted seasonal report to your Mailchimp audience. It is designed for retailers who need to stay on top of seasonal pricing patterns to make informed inventory and pricing decisions. Pre-conditions/Requirements Prerequisites Running n8n instance (self-hosted or n8n cloud) ScrapeGraphAI community node installed Mailchimp account with at least one audience list Baserow workspace with edit rights Product URLs or SKU list from target e-commerce platforms Required Credentials | Credential | Used By | Scope | |------------|---------|-------| | ScrapeGraphAI API Key | ScrapeGraphAI node | Web scraping | | Mailchimp API Key & Server Prefix | Mailchimp node | Sending emails | | Baserow API Token | Baserow node | Reading & writing records | Baserow Table Setup Create a table named price_tracker with the following fields: | Field Name | Type | Example | |------------|------|---------| | product_name | Text | “Winter Jacket” | | product_url | URL | https://example.com/winter-jacket | | current_price | Number | 59.99 | | scrape_date | DateTime | 2023-11-15T08:21:00Z | How it works This workflow scrapes multiple e-commerce sites for product pricing data, stores the historical prices in Baserow, analyzes weekly trends, and emails a neatly formatted seasonal report to your Mailchimp audience. It is designed for retailers who need to stay on top of seasonal pricing patterns to make informed inventory and pricing decisions. Key Steps: Schedule Trigger**: Fires every week (or custom CRON) to start the monitoring cycle. Code (Prepare URLs)**: Loads or constructs the list of product URLs to monitor. SplitInBatches**: Processes product URLs in manageable batches to avoid rate-limit issues. ScrapeGraphAI**: Scrapes each product page and extracts the current price and name. If (Price Found?)**: Continues only if scraping returns a valid price. Baserow**: Upserts the scraped data into the price_tracker table. Code (Trend Analysis)**: Aggregates weekly data to detect price increases, decreases, or stable trends. Set (Mail Content)**: Formats the trend summary into an HTML email body. Mailchimp**: Sends the seasonal price-trend report to the selected audience segment. Sticky Note**: Documentation node explaining business logic in-workflow. Set up steps Setup Time: 10-15 minutes Clone the template: Import the workflow JSON into your n8n instance. Install ScrapeGraphAI: n8n-nodes-scrapegraphai via the Community Nodes panel. Add credentials: a. ScrapeGraphAI API Key b. Mailchimp API Key & Server Prefix c. Baserow API Token Configure Baserow node: Point it to your price_tracker table. Edit product list: In the “Prepare URLs” Code node, replace the sample URLs with your own. Adjust schedule: Modify the Schedule Trigger CRON expression if weekly isn’t suitable. Test run: Execute the workflow manually once to verify credentials and data flow. Activate: Turn on the workflow for automatic weekly monitoring. Node Descriptions Core Workflow Nodes: Schedule Trigger** – Initiates the workflow on a weekly CRON schedule. Code (Prepare URLs)** – Generates an array of product URLs/SKUs to scrape. SplitInBatches** – Splits the array into chunks of 5 URLs to stay within request limits. ScrapeGraphAI** – Scrapes each URL, using XPath/CSS selectors to pull price & title. If (Price Found?)** – Filters out failed or empty scrape results. Baserow** – Inserts or updates the price record in the database. Code (Trend Analysis)** – Calculates week-over-week price changes and flags anomalies. Set (Mail Content)** – Creates an HTML table with product, current price, and trend arrow. Mailchimp** – Sends or schedules the email campaign. Sticky Note** – Provides inline documentation and edit hints. Data Flow: Schedule Trigger → Code (Prepare URLs) → SplitInBatches SplitInBatches → ScrapeGraphAI → If (Price Found?) → Baserow Baserow → Code (Trend Analysis) → Set (Mail Content) → Mailchimp Customization Examples Change scraping frequency // Schedule Trigger CRON for daily at 07:00 UTC 0 7 * * * Add competitor comparison column // Code (Trend Analysis) item.competitor_price_diff = item.current_price - item.competitor_price; return item; Data Output Format The workflow outputs structured JSON data: { "product_name": "Winter Jacket", "product_url": "https://example.com/winter-jacket", "current_price": 59.99, "scrape_date": "2023-11-15T08:21:00Z", "weekly_trend": "decrease" } Troubleshooting Common Issues Invalid ScrapeGraphAI key – Verify the API key and ensure your subscription is active. Mailchimp “Invalid Audience” error – Double-check the audience ID and that the API key has correct permissions. Baserow “Field mismatch” – Confirm your table fields match the names/types in the workflow. Performance Tips Limit each SplitInBatches run to ≤10 URLs to reduce scraping timeouts. Enable caching in ScrapeGraphAI to avoid repeated requests to the same URL within short intervals. Pro Tips: Use environment variables for all API keys to avoid hard-coding secrets. Add an extra If node to alert you if a product’s price drops below a target threshold. Combine with n8n’s Slack node for real-time alerts in addition to Mailchimp summaries.
by vinci-king-01
Medical Research Tracker with Matrix and Pipedrive ⚠️ COMMUNITY TEMPLATE DISCLAIMER: This is a community-contributed template that uses ScrapeGraphAI (a community node). Please ensure you have the ScrapeGraphAI community node installed in your n8n instance before using this template. This workflow automatically monitors selected government and healthcare-policy websites, extracts newly published or updated policy documents, logs them as deals in a Pipedrive pipeline, and announces critical changes in a Matrix room. It gives healthcare administrators and policy analysts a near real-time view of policy developments without manual web checks. Pre-conditions/Requirements Prerequisites n8n instance (self-hosted or n8n cloud) ScrapeGraphAI community node installed Active Pipedrive account with at least one pipeline Matrix account & accessible room for notifications Basic knowledge of n8n credential setup Required Credentials ScrapeGraphAI API Key** – Enables the scraping engine Pipedrive OAuth2 / API Token** – Creates & updates deals Matrix Credentials** – Homeserver URL, user, access token (or password) Specific Setup Requirements | Variable | Description | Example | |----------|-------------|---------| | POLICY_SITES | Comma-separated list of URLs to scrape | https://health.gov/policies,https://who.int/proposals | | PD_PIPELINE_ID | Pipedrive pipeline where deals are created | 5 | | PD_STAGE_ID_ALERT | Stage ID for “Review Needed” | 17 | | MATRIX_ROOM_ID | Room to send alerts (incl. leading !) | !policy:matrix.org | Edit the initial Set node to provide these values before running. How it works This workflow automatically monitors selected government and healthcare-policy websites, extracts newly published or updated policy documents, logs them as deals in a Pipedrive pipeline, and announces critical changes in a Matrix room. It gives healthcare administrators and policy analysts a near real-time view of policy developments without manual web checks. Key Steps: Scheduled Trigger**: Runs every 6 hours (configurable) to start the monitoring cycle. Code (URL List Builder)**: Generates an array from POLICY_SITES for downstream batching. SplitInBatches**: Iterates through each policy URL individually. ScrapeGraphAI**: Scrapes page titles, publication dates, and summary paragraphs. If (New vs Existing)**: Compares scraped hash with last run; continues only for fresh content. Merge (Aggregate Results)**: Collects all “new” policies into a single payload. Set (Deal Formatter)**: Maps scraped data to Pipedrive deal fields. Pipedrive Node**: Creates or updates a deal per policy item. Matrix Node**: Posts a formatted alert message in the specified Matrix room. Set up steps Setup Time: 15-20 minutes Install Community Node – In n8n, go to Settings → Community Nodes → Install and search for ScrapeGraphAI. Add Credentials – Create New credentials for ScrapeGraphAI, Pipedrive, and Matrix under Credentials. Configure Environment Variables – Open the Set (Initial Config) node and replace placeholders (POLICY_SITES, PD_PIPELINE_ID, etc.) with your values. Review Schedule – Double-click the Schedule Trigger node to adjust the interval if needed. Activate Workflow – Click Activate. The workflow will run at the next scheduled interval. Verify Outputs – Check Pipedrive for new deals and the Matrix room for alert messages after the first run. Node Descriptions Core Workflow Nodes: stickyNote** – Provides an at-a-glance description of the workflow logic directly on the canvas. scheduleTrigger** – Fires the workflow periodically (default 6 hours). code (URL List Builder)** – Splits the POLICY_SITES variable into an array. splitInBatches** – Ensures each URL is processed individually to avoid timeouts. scrapegraphAi** – Parses HTML and extracts policy metadata using XPath/CSS selectors. if (New vs Existing)** – Uses hashing to ignore unchanged pages. merge** – Combines all new items so they can be processed in bulk. set (Deal Formatter)** – Maps scraped fields to Pipedrive deal properties. matrix** – Sends formatted messages to a Matrix room for team visibility. pipedrive** – Creates or updates deals representing each policy update. Data Flow: scheduleTrigger → code → splitInBatches → scrapegraphAi → if → merge → set → pipedrive → matrix Customization Examples 1. Add another data field (e.g., policy author) // Inside ScrapeGraphAI node → Selectors { "title": "//h1/text()", "date": "//time/@datetime", "summary": "//p[1]/text()", "author": "//span[@class='author']/text()" // new line } 2. Switch notifications from Matrix to Email // Replace Matrix node with “Send Email” { "to": "policy-team@example.com", "subject": "New Healthcare Policy Detected: {{$json.title}}", "text": "Summary:\n{{$json.summary}}\n\nRead more at {{$json.url}}" } Data Output Format The workflow outputs structured JSON data for each new policy article: { "title": "Affordable Care Expansion Act – 2024", "url": "https://health.gov/policies/acea-2024", "date": "2024-06-14T09:00:00Z", "summary": "Proposes expansion of coverage to rural areas...", "source": "health.gov", "hash": "2d6f1c8e3b..." } Troubleshooting Common Issues ScrapeGraphAI returns empty objects – Verify selectors match the current HTML structure; inspect the site with developer tools and update the node configuration. Duplicate deals appear in Pipedrive – Ensure the “Find or Create” option is enabled in the Pipedrive node, using the page hash or url as a unique key. Performance Tips Limit POLICY_SITES to under 50 URLs per run to avoid hitting rate limits. Increase Schedule Trigger interval if you notice ScrapeGraphAI rate-limiting. Pro Tips: Store historical scraped data in a database node for long-term audit trails. Use the n8n Workflow Executions page to replay failed runs without waiting for the next schedule. Add an Error Trigger node to emit alerts if scraping or API calls fail.
by Yang
🛍️ Pick Best-Value Products from Any Website Using Dumpling AI, GPT-4o, and Google Sheets Who’s it for This workflow is for eCommerce researchers, affiliate marketers, and anyone who needs to compare product listings across sites like Amazon. It’s perfect for quickly identifying top product picks based on delivery speed, free shipping, and price. What it does Just submit a product listing URL. The workflow will crawl it using Dumpling AI, take screenshots of the pages, and pass them to GPT-4o to extract up to 3 best-value picks. It analyzes screenshots visually—no HTML scraping needed. Each result includes: product name price review count free delivery date (if available) How it works 📝 Receives a URL through a web form 🧠 Uses Dumpling AI to crawl the website 📸 Takes screenshots of each product listing 🔍 GPT-4o analyzes each image to pick top products 🔧 A code node parses and flattens the output 📊 Google Sheets stores the result 📧 Sends the spreadsheet link via email Requirements Dumpling AI token** OpenAI key** (GPT-4o) Google Sheet** with columns: product name, price, reviews no., free_delivery_date > You can customize the AI prompt to extract other visual insights (e.g., ratings, specs).
by Colton Randolph
This n8n workflow automatically scrapes TechCrunch articles, filters for AI-related content using OpenAI, and delivers curated summaries to your Slack channels. Perfect for individuals or teams who need to stay current on artificial intelligence developments without manually browsing tech news sites. Who's it for AI product teams tracking industry developments and competitive moves Tech investors monitoring AI startup coverage and funding announcements Marketing teams following AI trends for content and positioning strategies Executives needing daily AI industry briefings without manual research overhead Development teams staying current on AI tools, frameworks, and breakthrough technologies How it works The workflow runs on a daily schedule, crawling a specificed amount of TechCrunch articles from the current year. Firecrawl extracts clean markdown content while bypassing anti-bot measures and handling JavaScript rendering automatically. Each article gets analyzed by an AI research assistant that determines if the content relates to artificial intelligence, machine learning, AI companies, or AI technology. Articles marked as "NOT_AI_RELATED" get filtered out automatically. For AI-relevant articles, OpenAI generates focused 3-bullet-point summaries that capture key insights. These summaries get delivered to your specified Slack channel with the original TechCrunch article title and source link for deeper reading. How to set up Configure Firecrawl: Add your Firecrawl API key to the HTTP Request node Set OpenAI credentials: Add your OpenAI API key to the AI Agent node Connect Slack: Configure your Slack webhook URL and target channel Adjust scheduling: Set your preferred trigger frequency (daily recommended) Test the workflow: Run manually to verify article extraction and Slack delivery Requirements Firecrawl account** with API access for TechCrunch web scraping OpenAI API key** for AI content analysis and summarization Slack workspace** with webhook permissions for message delivery n8n instance** (cloud or self-hosted) for workflow execution How to customize the workflow Source expansion: Modify the HTTP node URL to target additional tech publications beyond TechCrunch, or adjust the article limit and date filtering for different coverage needs. AI focus refinement: Update the OpenAI prompt to focus on specific AI verticals like generative AI, robotics, or ML infrastructure. Add company names or technology terms to the relevance filtering logic. Summary formats: Change from 3-bullet summaries to executive briefs, technical analyses, or competitive intelligence reports by modifying the OpenAI summarization prompt. Multi-channel delivery: Extend beyond Slack to email notifications, Microsoft Teams, or database storage for historical trend analysis and executive dashboards.
by Incrementors
Description Submit your page URL, a competitor's page URL, and a target keyword using a simple form. The workflow automatically scrapes both pages, strips all HTML, and sends the full comparison to GPT-4o-mini for analysis. Within seconds, a structured 6-section content gap report lands in your Slack channel — ready to act on. Built for SEO teams, content strategists, and agency analysts who need fast, repeatable competitor insights. What This Workflow Does Parallel page scraping** — Fetches your page and the competitor's page simultaneously so you get results faster, not one site at a time HTML cleaning** — Strips all scripts, ads, and navigation clutter from both pages, leaving only the actual content GPT-4o-mini needs to compare Content gap identification** — AI pinpoints exactly which topics, subtopics, and questions your page is missing that the competitor already covers Competitive advantage mapping** — Surfaces what your page has that the competitor lacks, so you know what to protect and promote Priority action list** — Delivers 5 concrete, ranked improvements specific to your page — not generic SEO advice Token-efficient processing** — Caps each page at 8,000 characters so every run stays fast and API costs stay predictable Slack report delivery** — Posts the full 6-section analysis with business name, keyword, both URLs, and run date directly to your team channel — ready to act on or forward to a client Setup Requirements Tools Needed n8n instance (self-hosted or cloud) OpenAI account with GPT-4o-mini API access Slack workspace with OAuth2 app configured Estimated Setup Time: 10–15 minutes Step-by-Step Setup Import the workflow — Open n8n → Workflows → Import from JSON → paste the workflow JSON → click Import Connect your OpenAI credential — Go to node 10. OpenAI — GPT-4o-mini Model → click the credential dropdown → add your OpenAI API key → test the connection Connect your Slack credential — Go to node 12. Slack — Send Gap Report → click the credential dropdown → select OAuth2 → follow the Slack OAuth flow to connect your workspace Set your Slack channel — In node 12. Slack — Send Gap Report, set the channel field to the channel name where reports should be posted (e.g. #seo-reports) Activate the workflow — Toggle the workflow to Active → copy the Form URL from node 1. Form — Submit Page URLs → open it in a browser to test > ⚠️ Bot-Protected Sites — Some sites return a 403 Forbidden error when scraped. If this happens, open nodes 3. HTTP — Scrape Your Page and 4. HTTP — Scrape Competitor Page, add a header with Name = User-Agent and Value = Mozilla/5.0 (compatible; n8n-bot/1.0) in both nodes. How It Works (Step by Step) Step 1 — Form: Submit Page URLs You open the form URL in a browser and fill in four fields: your page URL, the competitor's page URL, the target keyword, and your business name. Submitting the form kicks off the entire workflow automatically. Step 2 — Set: Extract Form Fields All four form inputs are mapped to clean named variables. A run timestamp is automatically added so every report is dated. These variables flow into every downstream step. Step 3 — HTTP: Scrape Your Page (parallel) An HTTP request fetches the full HTML content of your page. This step runs at the same time as Step 4, so both pages are retrieved simultaneously without waiting. Step 4 — HTTP: Scrape Competitor Page (parallel) An identical HTTP request fetches the competitor's page in parallel with Step 3. Both pages are ready at the same time. Step 5 — Code: Clean Your Page HTML A code step removes all script tags, style tags, and HTML markup from your page. The result is plain readable text, trimmed to 8,000 characters to keep AI costs low and responses fast. Step 6 — Code: Clean Competitor Page HTML The same cleaning process runs on the competitor's page. This step also carries forward all the form variables (keyword, URLs, business name, run date) so nothing is lost in the merge. Step 7 — Merge: Combine Both Pages Both cleaned page texts — yours and the competitor's — flow into a merge step that combines them into a single pipeline for the next step. Step 8 — Code: Combine Page Data A code step safely joins both items into one clean object. If either page failed to scrape, it uses a fallback message instead of crashing the workflow. Step 9 — AI Agent: Gap Analyzer GPT-4o-mini receives both page texts, the target keyword, business name, and both URLs. It produces a plain-text 6-section analysis: keyword usage comparison, topics your page is missing, topics you have that the competitor lacks, content depth and quality comparison, five priority actions ranked by impact, and a quick 3-sentence verdict. Step 10 — OpenAI: GPT-4o-mini Model This is the language model powering the AI Agent. It is configured with a temperature of 0.4 for consistent, factual analysis and a max token limit of 1,500 to keep reports concise. Step 11 — Set: Prepare Slack Message All report fields are assembled into a single clean object: the AI analysis, both URLs, target keyword, business name, and run date. This is the complete payload that goes to Slack. Step 12 — Slack: Send Gap Report The full report is posted to your Slack channel in a formatted message. It includes the business name, keyword, run date, both URLs, the full 6-section AI analysis, and a footer noting the report was generated by n8n + GPT-4o-mini. Key Features ✅ Parallel scraping — Both pages are fetched at the same time, not one after the other, saving you time on every run ✅ Auto HTML stripping — Scripts, styles, and all tags are removed automatically — no manual cleanup needed ✅ Token budget control — Each page is hard-capped at 8,000 characters so API costs stay predictable ✅ Fallback handling — If a page fails to scrape, the workflow continues and notes the failure rather than crashing ✅ 6-section structured report — Every report follows the same format so results are easy to compare across competitors and dates ✅ Slack delivery with metadata — Reports arrive with business name, keyword, run date, and both URLs for full context ✅ Plain text output — No markdown symbols in the AI analysis, making it easy to paste directly into a doc or client report ✅ One-form trigger — The whole workflow starts with a single form submission — no coding, no manual steps Customisation Options Change the text limit per page — In nodes 5. Code — Clean Your Page HTML and 6. Code — Clean Competitor Page HTML, change .substring(0, 8000) to a higher number (e.g. 12000) if you want deeper analysis on long-form pages. Note this will increase GPT token usage. Add email delivery — After node 11. Set — Prepare Slack Message, add a Gmail or SMTP node to also send the report by email. Use the same gapReport variable for the email body. Save reports to Google Sheets — Add a Google Sheets node after the Slack node to log every run: business name, keyword, date, competitor URL, and a summary of the verdict section. Schedule weekly competitor checks — Replace the form trigger with a Schedule trigger and a Set node with hardcoded URLs and keywords to automatically run gap analysis every Monday morning. Expand the AI report sections — In node 9. AI Agent — Gap Analyzer, edit the prompt to add a Section 7 covering suggested internal links, or a Section 8 comparing schema markup signals. Troubleshooting OpenAI credential not working: Confirm you added the API key in node 10. OpenAI — GPT-4o-mini Model, not elsewhere Check that your OpenAI account has available credits Make sure you are using a key with access to GPT-4o-mini (not a restricted key) Scraping returns a 403 or empty result: Add a User-Agent header to both 3. HTTP — Scrape Your Page and 4. HTTP — Scrape Competitor Page Header Name: User-Agent, Value: Mozilla/5.0 (compatible; n8n-bot/1.0) Some enterprise or Cloudflare-protected sites cannot be scraped — try the mobile version of the URL instead Slack message not arriving: Confirm the OAuth2 credential in node 12. Slack — Send Gap Report is connected and authorised Check that the channel name is correct and the bot has been invited to that channel In Slack, go to the channel → click the channel name → Integrations → confirm the n8n app is listed Report is too short or generic: The page text may have been mostly scripts with little readable content — check the cleaned text in node 5 or 6 by running a test Try a different URL format (e.g. without trailing slash) or the AMP version of the page Increase the max token setting in node 10 from 1500 to 2000 for more detailed output Form submission not triggering the workflow: Make sure the workflow is set to Active (toggle in the top right of the workflow editor) Copy the Form URL fresh from node 1. Form — Submit Page URLs after activating — inactive workflows generate a test URL, not a live one Support Need help setting this up or want a custom version built for your team or agency? 📧 Email: info@incrementors.com 🌐 Website: https://www.incrementors.com/contact-us/
by Onur
Automated B2B Lead Generation: Google Places, Scrape.do & AI Enrichment This workflow is a powerful, fully automated B2B lead generation engine. It starts by finding businesses on Google Maps based on your criteria (e.g., "dentists" in "Istanbul"), assigns a quality score to each, and then uses Scrape.do to reliably access their websites. Finally, it leverages an AI agent to extract valuable contact information like emails and social media profiles. The final, enriched data is then neatly organized and saved directly into a Google Sheet. This template is built for reliability, using Scrape.do to handle the complexities of web scraping, ensuring you can consistently gather data without getting blocked. 🚀 What does this workflow do? Automatically finds businesses using the Google Places API based on a category and location you define. Calculates a leadScore for each business based on its rating, website presence, and operational status to prioritize high-quality leads. Filters out low-quality leads** to ensure you only focus on the most promising prospects. Reliably scrapes the website of each high-quality lead using Scrape.do to bypass common blocking issues and retrieve the raw HTML. Uses an AI Agent (OpenAI) to intelligently parse the website's HTML and extract hard-to-find contact details (emails, social media links, phone numbers). Saves all enriched lead data** to a Google Sheet, creating a clean, actionable list for your sales or marketing team. Runs on a schedule**, continuously finding new leads without any manual effort. 🎯 Who is this for? Sales & Business Development Teams:** Automate prospecting and build targeted lead lists. Marketing Agencies:** Generate leads for clients in specific industries and locations. Freelancers & Consultants:** Quickly find potential clients for your services. Startups & Small Businesses:** Build a customer database without spending hours on manual research. ✨ Benefits Full Automation:** Set it up once and let it run on a schedule to continuously fill your pipeline. AI-Powered Enrichment:** Go beyond basic business info. Get actual emails and social profiles that aren't available on Google Maps. Reliable Website Access:* Leverages *Scrape.do** to handle proxies and prevent IP blocks, ensuring consistent data gathering from target websites. High-Quality Leads:** The built-in scoring and filtering system ensures you don't waste time on irrelevant or incomplete listings. Centralized Database:** All your leads are automatically organized in a single, easy-to-access Google Sheet. ⚙️ How it Works Schedule Trigger: The workflow starts automatically at your chosen interval (e.g., daily). Set Parameters: You define the business type (searchCategory) and location (locationName) in a central Set node. Find Businesses: It calls the Google Places API to get a list of businesses matching your criteria. Score & Filter: A custom Function node scores each lead. An IF node then separates high-quality leads from low-quality ones. Loop & Enrich: The workflow processes each high-quality lead one by one. It uses a scraping service (Scrape.do) to reliably fetch the lead's website content. An AI Agent (OpenAI) analyzes the website's footer to find contact and social media links. Save Data: The final, enriched lead information is appended as a new row in your Google Sheet. 📋 n8n Nodes Used Schedule Trigger Set HTTP Request (for Google Places & Scrape.do) Function If Split in Batches (Loop Over Items) HTML Langchain Agent (with OpenAI Chat Model & Structured Output Parser) Google Sheets 🔑 Prerequisites An active n8n instance. Google Cloud Project* with the *Places API** enabled. Google Places API Key**, stored in n8n's Header Auth credentials. A Scrape.do Account and API Token**. This is essential for reliably scraping websites without your n8n server's IP getting blocked. OpenAI Account & API Key** for the AI-powered data extraction. Google Account** with access to Google Sheets. Google Sheets API Credentials (OAuth2)** configured in n8n. A Google Sheet* prepared with columns to store the lead data (e.g., BusinessName, Address, Phone, Website, Email, Facebook, etc.*). 🛠️ Setup Import the workflow into your n8n instance. Configure Credentials: Create and/or select your credentials for: Google Places API: In the 2. Find Businesses (Google Places) node, select your Header Auth credential containing your API key. Scrape.do: In the 6a. Scrape Website HTML node, configure credentials for your Scrape.do account. OpenAI: In the OpenAI Chat Model node, select your OpenAI credentials. Google Sheets: In the 7. Save to Google Sheets node, select your Google Sheets OAuth2 credentials. Define Your Search: In the 1. Set Search Parameters node, update the searchCategory and locationName values to match your target market. Link Your Google Sheet: In the 7. Save to Google Sheets node, select your Spreadsheet and Sheet Name from the dropdown lists. Map the incoming data to the correct columns in your sheet. Set Your Schedule: Adjust the Schedule Trigger to run as often as you like (e.g., once a day). Activate the workflow! Your automated lead generation will begin on the next scheduled run.
by Bhavy Shekhaliya
Overview AI-powered n8n workflow that creates viral LinkedIn posts by learning from successful content. Features two modules: (1) Telegram-based scraper that builds a vector database of viral LinkedIn posts, and (2) Web form that generates optimized posts using multi-agent AI with RAG (Retrieval-Augmented Generation) from your curated viral content library. Key Capabilities: Scrapes LinkedIn post content via Telegram bot Stores posts in Supabase vector database with OpenAI embeddings 3-agent system analyzes hooks, structures outlines, and generates posts RAG integration retrieves similar viral posts for pattern matching Auto-publishes to LinkedIn or provides formatted output How It Works Module 1: Viral Post Collection (Telegram Bot) Step 1: URL Validation User sends LinkedIn post URL to Telegram bot Workflow validates URL contains "linkedin.com" Shows typing indicator for better UX Step 2: Content Scraping HTTP request fetches post HTML CSS selector extracts main commentary: [data-test-id="main-feed-activity-card__commentary"] Handles scraping failures with error messages Step 3: Vector Storage Converts post text to OpenAI embeddings (text-embedding-ada-002) Stores in Supabase linkedin_post table with vector indexing Sends success confirmation via Telegram Module 2: AI Post Generation (Web Form) Stage 1: Hook Analysis Agent Input**: User-provided hook text Process**: AI extracts topic, niche/industry, emotional tone, and 3-5 key points Output**: Structured JSON with analyzed elements Models**: GPT-4o-mini or Gemini 2.5-flash (dual fallback) Stage 2: Post Structure Agent Input**: Analyzed hook data Process**: Creates 5-section outline (Hook, Problem, Value/Lesson, Solution, CTA) Output**: Structured framework for final post Models**: GPT-4o-mini or Gemini 2.5-flash Stage 3: Post Generator Agent (RAG) Input**: Post structure + topic RAG Process**: Queries Supabase vector store for 5 most similar viral posts Analyzes patterns: hooks, storytelling, CTAs, engagement metrics Identifies optimal length, formatting, and emotional triggers Output**: Complete LinkedIn post applying viral patterns Models**: GPT-4o-mini or Gemini 2.5-flash with GPT-5-NANO for structured output Stage 4: Publication Auto-publishes to LinkedIn via API Or returns formatted post text for manual posting How To Use Setup 1. Configure Supabase Vector Database Create Supabase project Create table: linkedin_post with vector column (1536 dimensions for OpenAI embeddings) Enable vector extension: CREATE EXTENSION vector; Update credentials in "Upload Document" and "Supabase Vector Store" nodes 2. Set Up Telegram Bot (Module 1) Create bot via @BotFather Get bot token and update "On Telegram Message" credentials Start bot and get your chat ID Activate workflow 3. Configure OpenAI API Add API key to "Embeddings" nodes (both modules) Configure language model credentials (GPT-4o-mini, GPT-5-NANO) 4. Set Up LinkedIn API (Optional for Module 2) Create LinkedIn app with member permissions Configure OAuth2 credentials in "Create a post" node Or remove node to get text output only 5. Access Web Form Get form URL from "LinkedIn Form" webhook Bookmark for easy access
by Mirai
Icebreaker Generator powered with ChatGPT This n8n template crawls a company website, distills the content with AI, and produces a short, personalized icebreaker you can drop straight into your cold emails or CRM. Perfect for SDRs, founders, and agencies who want “real research” at scale. Good to know Works from a Google Sheet of leads (domain + LinkedIn, etc.). Handles common scrape failures gracefully and marks the lead’s Status as Error. Uses ChatGPT to summarize pages and craft one concise, non-generic opener. Output is written back to the same Google Sheet (IceBreaker, Status). You’ll need Google credentials (for Sheets) and OpenAI credentials (for GPT). How it works Step 1 — Discover internal pages Reads a lead’s website from Google Sheets. Scrapes the home page and extracts all links. A Code node cleans the list (removes emails/anchors/social/external domains, normalizes paths, de-duplicates) and returns unique internal URLs. If the home page is unreachable or no links are found, the lead is marked Error and the workflow moves on. Step 2 — Convert pages to text Visits each collected URL and converts the response into HTML/Markdown text for analysis. You can cap depth/amount with the Limit node. Step 3 — Summarize & generate the icebreaker A GPT node produces a two-paragraph abstract for each page (JSON output). An Aggregate node merges all abstracts for the company. Another GPT node turns the merged summary into a personalized, multi-line icebreaker (spartan tone, non-obvious details). The result is written back to Google Sheets (IceBreaker = ..., Status = Done). The workflow loops to the next lead. How to use Prepare your sheet Include at least: organization_website_url, linkedin_url, and any other lead fields you track. Keep an empty IceBreaker and Status column for the workflow to fill. Connect credentials Google Sheets: use the Google account that owns the sheet and link it in the nodes. OpenAI: add your API key to the GPT nodes (“Summarize Website Page”, “Generate Multiline Icebreaker”). Run the workflow Start with the Manual Trigger (or replace with a schedule/webhook). Adjust Limit if you want fewer/more pages per company. Watch Status (Done/Error) and IceBreaker populate in your sheet. Requirements n8n instance Google Sheets account & access to the leads sheet OpenAI API key (for summarization + icebreaker generation) Customizing this workflow Tone & format: tweak the prompts (both GPT nodes) to match your brand voice and structure. Depth: change the Limit node to scan more/less pages; add simple rules to prioritize certain paths (e.g., /about, /blog/*). Fields: write additional outputs (e.g., Company Summary, Key Products, Recent News) back to new sheet columns. Lead selection: filter rows by Status = "" (or custom flags) to only process untouched leads. Error handling: expand the Error branch to retry with www./HTTP→HTTPS or to log diagnostics in a separate tab. Tips Keep icebreakers short, specific, and free of clichés—small, non-obvious details from the site convert best. Start with a small batch to validate quality, then scale up. Consider adding a rate limit if target sites throttle requests. In short: Sheet → crawl internal pages → AI abstracts → single tailored icebreaker → write back to the sheet, then repeat for the next lead. This automation can work great with our automation for automated cold emailing.
by Harsh Agrawal
Automated SEO Intelligence Platform with DataForSEO and Claude Transform any company website into a detailed SEO audit report in minutes! This workflow combines real-time web scraping, comprehensive SEO data analysis, and advanced AI reasoning to deliver client-ready reports automatically. Perfect for digital agencies scaling their audit services, freelance SEO consultants automating research, or SaaS teams analyzing competitor strategies before sales calls. The Process Discovery Phase: Input a company name and website URL to kick things off. The system begins with website content extraction. Intelligence Gathering: A dedicated scraper sub-workflow extracts all website content and converts it to structured markdown. Strategic Analysis: LLMs process the scraped content to understand the business model, target market, and competitive positioning. They generate business research insights and product strategy recommendations tailored to that specific company. Once this analysis completes, DataForSEO API then pulls technical metrics, backlink profiles, keyword rankings, and site health indicators. Report Assembly: All findings flow into a master report generator that structures the data into sections covering technical SEO, content strategy, competitive landscape, and actionable next steps. Custom branded cover and closing pages are added. Delivery: The HTML report converts to PDF format and emails directly to your recipient - no manual intervention needed. Setup Steps Add API credentials: OpenRouter (for AI), DataForSEO (for scraping/SEO data), and PDFco (for PDF generation) Configure email sending through your preferred service (Gmail, SendGrid, etc.) Optional: Upload custom first/last page PDFs for white-label branding Test with your own website first to see the magic happen! Customize It Adjust analysis depth: Modify the AI prompts to focus on specific SEO aspects (local SEO, e-commerce, B2B SaaS, etc.) Change report style: Edit the HTML template in the Sample_Code node for different formatting Add integrations: Connect to your CRM to automatically trigger reports when leads enter your pipeline Scale it up: Process multiple URLs in batch by feeding a Google Sheet of prospects What You'll Need OpenRouter account (Claude Opus 4.1 recommended for best insights) DataForSEO subscription (handles both scraping and SEO metrics) PDFco account (converts your reports to professional PDFs) Email service credentials configured in n8n Need Help? Connect with me on LinkedIn if you have any doubt