by Alok Kumar
This n8n workflow shows how to extract website content, index it in Pinecone, and leverage Airtable to power a chat agent for customer Q&A. Use cases include: Building a knowledge base from your website. Creating a chatbot that answers customer queries using your own site content. Powering RAG workflows for FAQs, support docs, or product knowledge. How it works Workflow starts with a manual trigger or chat message. Website content is fetched via HTTP Request. The HTML body is extracted and converted into clean Markdown. Text is split into chunks (~500 chars with 50 overlap) using the Character Text Splitter. OpenAI embeddings** are generated for each chunk. Content and embeddings are stored in Pinecone with namespace separation. A Chat Agent (powered by OpenAI or OpenRouter) retrieves answers from Pinecone and Airtable. Memory buffer** allows multi-turn conversations. A billing tool (Airtable) provides dynamic billing-related answers when needed. How to use Replace the sample website URL in the HTTP Request node with your own domain or content source. Update Normalize code based on markdown content output to remove noise. Adjust chunk size in the Text Splitter for your website markdown output. In this example, the Character Text Splitter with separator ###### worked really well. Always check the Markdown output to fine-tune your splitting logic. Update Pinecone namespace to match your project. Customize the Chat Agent system prompt to fit your brand voice and response rules. Connect to your own Airtable schema if you want live billing/payment data access. Requirements OpenAI account** (for embeddings + chat model). Pinecone account** (vector DB for semantic search). Airtable account** (if using the billing tool). (Optional) OpenRouter account (alternative chat model provider). n8n self-hosted or cloud. Need Help? Ask in the n8n Forum! Happy Automating! 🚀
by Madame AI
AI-Powered Top GitHub Talent Sourcing (by Language & Location) to Google Sheet This n8n template is a powerful talent sourcing engine that finds, analyzes, and scores GitHub contributors using a custom AI formula. This workflow is ideal for technical recruiters, hiring managers, and team leads who want to build a pipeline of qualified candidates based on specific technical skills and location. Self-Hosted Only This Workflow uses a community contribution and is designed and tested for self-hosted n8n instances only. How it works The workflow runs on a Schedule Trigger (e.g., hourly) to constantly find new candidates. A BrowserAct node ("Run a workflow task") initiates a scraping job on GitHub based on your criteria (e.g., "Python" developers in "Berlin"). A second BrowserAct node ("Get details") waits for the scraping to complete. If the job fails, a Slack alert is sent. A Code node processes the raw scraped data, splitting the list of developers into individual items. An AI Agent, powered by Google Gemini, analyzes each profile. It scores their resume/summary and calculates a final weighted FinalScore based on their followers, repositories, and resume quality. The structured and scored candidate data is then saved to a Google Sheet, using the "Name" column to prevent duplicates. A final Slack message is sent to notify you that the GitHub contributors list has been successfully updated. Requirements BrowserAct** API account for web scraping BrowserAct* "Source Top GitHub Contributors by Language & Location*" Template BrowserAct** n8n Community Node -> (n8n Nodes BrowserAct) Gemini** account for the AI Agent Google Sheets** credentials for saving leads Slack** credentials for sending notifications Need Help? How to Find Your BrowseAct API Key & Workflow ID How to Connect n8n to Browseract How to Use & Customize BrowserAct Templates How to Use the BrowserAct N8N Community Node Workflow Guidance and Showcase Automate Talent Sourcing: Find GitHub Devs with n8n & Browseract
by Automate With Marc
📥 Invoice Intake & Notification Workflow This automated n8n workflow monitors a Google Drive folder for newly uploaded invoice PDFs, extracts essential information (like client name, invoice number, amount, due date), logs the data into a Google Sheet for recordkeeping, and sends a formatted Telegram message to notify the billing team. For step-by-step video build of workflows like this: https://www.youtube.com/@automatewithmarc ✅ What This Workflow Does 🕵️ Watches a Google Drive folder for new invoice files 📄 Extracts data from PDF invoices using AI (LangChain Information Extractor) 📊 Appends extracted data into a structured Google Sheet 💬 Notifies the billing team via Telegram with invoice details 🤖 Optionally uses Claude Sonnet AI model to format human-friendly summaries ⚙️ How It Works – Step-by-Step Trigger: Workflow starts when a new PDF invoice is added to a specific Google Drive folder. Download & Parse: The file is downloaded and its content extracted. Data Extraction: AI-powered extractor pulls invoice details (invoice number, client, date, amount, etc.). Log to Google Sheets: All extracted data is appended to a predefined Google Sheet. AI Notification Formatting: An Anthropic Claude model formats a clear invoice notification message. Telegram Alert: The formatted summary is sent to a Telegram channel or group to alert the billing team. 🧠 AI & Tools Used Google Drive Trigger & File Download PDF Text Extraction Node LangChain Information Extractor Google Sheets Node (Append Data) Anthropic Claude (Telegram Message Formatter) Telegram Node (Send Notification) 🛠️ Setup Instructions Google Drive: Set up OAuth2 credentials and specify the folder ID to watch. Google Sheets: Link the workflow to your invoice tracking sheet. Telegram: Set up your Telegram bot and obtain the chat ID. Anthropic & OpenAI: Add your Claude/OpenAI credentials if formatting is enabled. 💡 Use Cases Automated bookkeeping and invoice tracking Real-time billing alerts for accounting teams AI-powered invoice ingestion and summary
by Akshay Chug
Quick overview This scheduled workflow pulls article URLs from Google Sheets, scrapes each page, uses Anthropic Claude to generate a structured news briefing, saves results to Notion, posts a compiled digest to Slack, and appends a processing log back to Google Sheets. How it works Runs on a weekday morning cron schedule. Loads configuration values (sheet IDs, Notion database ID, Slack channel, and prompt settings) and reads article URLs from Google Sheets. Limits the run to the first 10 URLs, fetches each article’s HTML via HTTP, and extracts a cleaned text version. Sends the article text to Anthropic Claude to produce a JSON briefing with headline, summary, why it matters, and a relevance score. Creates a new page in Notion for each generated briefing. Appends each briefing record to a Google Sheets log and aggregates all briefings into a single daily message. Posts the aggregated daily digest to a Slack channel. Setup Add credentials for Google Sheets, Notion, Slack, and Anthropic (Claude) in n8n. Create a Google Sheet (tab name: urls) with a url column, and set the Google Sheet ID in the configuration. Create a second Google Sheet (tab name: log) for logging, and set the log sheet ID used by the append step. Create a Notion database for briefings and set the database/page target ID used by the Notion step. Set your Slack channel and review the cron expression and maxArticles limit to match your schedule and desired volume.
by Simeon Penev
Who’s it for Growth, marketing, sales, and founder teams that want a decision-ready Ideal Customer Profile (ICP)—grounded in their own site content. How it works / What it does On form submission* collects *Website URL* and *Business Name** and redirects to Google Drive Folder after the final node. Crawl and Scrape the Website Content* - crawls and scrape *20 pages** from the website. ICP Creator* builds a *Markdown ICP** with: A) Executive Summary B) One-Pager ICP C) Tiering & Lead Scoring D) Demand Gen & ABM Plays E) Evidence Log F) Section Confidence Facts vs. Inferences, confidence scores and tables. Markdown to Google Doc* converts Markdown to Google Docs batchUpdate requests. Then this is used in *Update a document** for updating the empty doc. Create a document* + *Update a document* generate *“ICP for <Business Name>”** in your Drive folder and apply formatting. How to set up 1) Add credentials: Firecrawl (Authorization header), OpenAI (Chat), Google Docs OAuth2. 2) Replace placeholders: {{API_KEY}}, {{google_drive_folder_id}}, {{google_drive_folder_url}}. 3) Publish and open the Form URL to test. Requirements Firecrawl API key • OpenAI API key • Google account with access to the target Drive folder. Resources Google OAuth2 Credentials Setup - https://docs.n8n.io/integrations/builtin/credentials/google/oauth-generic/ OpenAI API key - https://docs.n8n.io/integrations/builtin/credentials/openai/ Firecrawl API key - https://take.ms/lGcUp
by Madame AI
Find & Qualify Funded Leads with BrowserAct & Gemini This n8n template helps you find new investment leads by automatically scraping articles for funding announcements and analyzing them with an AI Agent. This workflow is ideal for venture capitalists, sales teams, or market researchers who need to automatically track and compile lists of recently funded companies. Self-Hosted Only This Workflow uses a community contribution and is designed and tested for self-hosted n8n instances only. How it works The workflow is triggered manually but can be set to a Cron node to run on a schedule. A Google Sheet node loads a list of keywords (e.g., "Series A," "Series B") and geographic locations to search for. The workflow loops through each keyword, initiating BrowserAct web scraping tasks to collect relevant articles. A second set of BrowserAct nodes patiently monitors the scraping jobs, waiting for them to complete before proceeding. Once all articles are collected, they are merged and fed into an AI Agent node, powered by Google Gemini. The AI Agent processes the articles to identify companies that recently received funding, extracting the Company Name, the Field of Investment, and the source URL. A Code node transforms the AI's JSON output into a clean, itemized format. An If node filters out any entries where no company was found, ensuring data quality. The qualified leads are automatically added or updated in a Google Sheet, matching by "Company" to prevent duplicates. Finally, a Slack message is sent to a channel to notify your team that the lead list has been updated. Requirements BrowserAct** API account for web scraping BrowserAct** n8n Community Node -> (n8n Nodes BrowserAct) BrowserAct* "Funding Announcement to Lead List (TechCrunch)*" Template (or a similar scraping workflow) Gemini** account for the AI Agent Google Sheets** credentials for input and output Slack** credentials for sending notifications Need Help? How to Find Your BrowseAct API Key & Workflow ID How to Connect n8n to Browseract How to Use & Customize BrowserAct Templates How to Use the BrowserAct n8n Community Node Workflow Guidance and Showcase How to Automatically Find Leads from Funding News (n8n Workflow Tutorial)
by Dean Pike
Convert any website into a searchable vector database for AI chatbots. Submit a URL, choose scraping scope, and this workflow handles everything: scraping, cleaning, chunking, embedding, and storing in Supabase. What it does Scrapes websites using Apify (3 modes: full site unlimited, full site limited, single URL) Cleans content (removes navigation, footer, ads, cookie banners, etc) Chunks text (800 chars, markdown-aware) Generates embeddings (Google Gemini, 768 dimensions) Stores in Supabase vector database Requirements Apify account + API token Supabase database with pgvector extension Google Gemini API key Setup Create Supabase documents table with embedding column (vector 768). Run this SQL query in your Supabase project to enable the vector store setup Add your Apify API token to all three "Run Apify Scraper" nodes Add Supabase and Gemini credentials Test with small site (5-10 pages) or single page/URL first Next steps Connect your vector store to an AI chatbot for RAG-powered Q&A, or build semantic search features into your apps. Tip: Start with page limits to test content quality before full-site scraping. Review chunks in Supabase and adjust Apify filters if needed for better vector embeddings. Sample Outputs Apify actor "runs" in Apify Dashboard from this workflow Supabase docuemnts table with scraped website content ingested in chunks with vector embeddings
by Shun Nakayama
Auto-Audit SEO Traffic Drops with AI & GSC Automatically monitor your Google Search Console data to catch SEO performance drops before they become critical. This workflow identifies pages losing rankings and clicks, scrapes the live content, and uses AI to analyze the gap between "Search Queries" (User Intent) and "Page Content" (Reality). It then delivers actionable fixes—including specific Title rewrites and missing H2 headings—directly to Slack. Ideal for SEO managers, content marketers, and site owners who want a pro-level SEO consultant running 24/7 on autopilot. How it works Compare & Detect: Compares last month's GSC performance with the previous month to identify pages with declining traffic. Deep Dive: For each struggling page, it fetches the top search queries and scrapes the actual Title and H2 tags from your live website. AI Analysis: An AI Agent analyzes why the page is failing by comparing user intent vs. actual content. Report: Sends a "Consultant-style" report to Slack with quantitative data and specific improvement tasks (e.g., "Add this H2 heading"). Set up steps Configure: Open the 📝 Edit Me: Config node and enter your GSC Property URL (e.g., sc-domain:example.com). Connect Credentials: Google Search Console: To fetch performance data. OpenAI: To analyze content and generate ideas. Slack: To receive the weekly reports. Activate: Turn on the workflow to receive your SEO audit every Monday morning.
by Ravi Patel
Quick overview This workflow takes a list of Google Maps search queries, extracts candidate business website URLs from the search results, visits each site, scrapes email addresses from the page content, and appends cleaned, deduplicated emails to Google Sheets fully automated, no manual research needed. How it works A manual trigger fires with a pinned list of Google Maps search queries, each representing a location and business type combination. A query loop iterates through each query, launching a sub-workflow execution per query and applying a configurable wait between executions to reduce request bursts. The sub-workflow sends an HTTP GET to the Google Maps search results URL for that query and reads the raw HTML response. A code node extracts all URLs from the HTML using a regex pattern, after which a filter node removes known system domains such as Google, gstatic, schema.org, and Wix internals, and a deduplication node removes repeated URLs before visiting. A URL loop processes each candidate business website one at a time, fetching the page HTML and passing it into the email processing flow. A code node applies a regex to the fetched page content to extract email-shaped strings, excluding common image file extensions, and a loop-over-pages structure aggregates emails across all visited pages for a given query. The aggregated email array is split into one item per email, deduplicated, filtered to remove junk and system addresses, and appended to the configured Google Sheet. Setup Add a Google Sheets OAuth credential and select the target spreadsheet and sheet tab in the Google Sheets append step. Update the list of input queries (the query field) used by the manual trigger to match the locations/industries you want to scrape. Adjust the wait duration between query executions if you need slower or faster pacing for Google Maps requests. Requirements Google Sheets account for storing and appending scraped email output n8n instance with HTTP Request node access and the ability to execute sub-workflows Customization The query list in the manual trigger is the primary lever swap in any location, industry, or keyword combination you want to target. The filter node for irrelevant domains can be extended with additional regex patterns if you find recurring junk domains in your output. You can also modify the email regex in the Scrape emails from page code node if you want to apply stricter validation rules or capture emails in specific formats only. If you want to scale beyond a manual run, replace the manual trigger with a scheduled trigger and read queries from a Google Sheet rather than using pinned data, which removes the need to redeploy the workflow for each new query list. Additional info Try It Out This n8n template is a lightweight, no-headless-browser solution for scraping business email addresses from Google Maps search results. You provide search queries and a Google Sheet destination, and the workflow handles URL extraction, site crawling, email parsing, deduplication, and output automatically. It is best suited for targeted, low-volume scraping runs where you need a quick way to collect contact emails for a defined set of business categories and locations, without setting up a full scraping infrastructure. Need help setting this up or want a version with AI-powered email validation and outreach sequencing built in? Visit https://patelravi.co.in/n8n_expert
by Oneclick AI Squad
This n8n workflow automates task creation and scheduled reminders for users via a Telegram bot, ensuring timely notifications across multiple channels like email and Slack. It streamlines task management by validating inputs, storing tasks securely, and delivering reminders while updating statuses for seamless follow-up. Key Features Enables users to create tasks directly in chat via webhook integration. Triggers periodic checks for due tasks and processes them individually for accurate reminders. Routes reminders to preferred channels (Telegram, email, or Slack) based on user settings. Validates inputs, handles errors gracefully, and logs task data for persistence and auditing. Workflow Process The Webhook Entry Point node receives task creation requests from users via chat (e.g., Telegram bot), including details like user ID, task description, and channel preferences. The Input Validation node checks for required fields (e.g., user ID, task description); if validation fails, it routes to the Error Response node. The Save to Database node stores validated task data securely in a database (e.g., PostgreSQL, MongoDB, or MySQL) for persistence. The Success Response node (part of Response Handlers) returns a confirmation message to the user in JSON format. The Schedule Trigger node runs every 3 minutes to check for pending reminders (with a 5-minute buffer for every hour to avoid duplicates). The Fetch Due Tasks node queries the database for tasks due within the check window (e.g., reminders set for within 3 minutes). The Tasks Check node verifies if fetched tasks exist and are eligible for processing. The Split Items node processes each due task individually to handle them in parallel without conflicts. The Channel Router node directs reminders to the appropriate channel based on task settings (e.g., email, Slack, or Telegram). The Email Sender node sends HTML-formatted reminder emails with task details and setup instructions. The Slack Sender node delivers Slack messages using webhooks, including task formatting and user mentions. The Telegram Sender node sends Telegram messages via bot API, including task ID, bot setup, and conversation starters. The Update Task Status node marks the task as reminded in the database (e.g., updating status to "sent" with timestamp). The Workflow Complete! node finalizes the process, logging completion and preparing for the next cycle. Setup Instructions Import the workflow into n8n and configure the Webhook Entry Point with your Telegram bot's webhook URL and authentication. Set up database credentials in the Save to Database and Fetch Due Tasks nodes (e.g., connect to PostgreSQL or MongoDB). Configure channel-specific credentials: Telegram bot token for Telegram Sender, email SMTP for Email Sender, and Slack webhook for Slack Sender. Adjust the Schedule Trigger interval (e.g., every 3 minutes) and add any custom due-time logic in Fetch Due Tasks. Test the workflow by sending a sample task creation request via the webhook and simulating due tasks to verify reminders and status updates. Monitor executions in n8n dashboard and tweak validation rules or response formats as needed for your use case. Prerequisites Telegram bot setup with webhook integration for task creation and messaging. Database service (e.g., PostgreSQL, MongoDB, or MySQL) for task storage and querying. Email service (e.g., SMTP provider) and Slack workspace for multi-channel reminders. n8n instance with webhook and scheduling enabled. Basic API knowledge for bot configuration and channel routing. Modification Options Customize the Input Validation node to add fields like priority levels or recurring task flags. Extend the Channel Router to include additional channels (e.g., Microsoft Teams or SMS via Twilio). Modify the Schedule Trigger to use dynamic intervals based on task urgency or user preferences. Enhance the Update Task Status node to trigger follow-up actions, like archiving completed tasks. Adjust the Telegram Sender node for richer interactions, such as inline keyboards for task rescheduling. Explore More AI Workflows: Get in touch with us for custom n8n automation!
by vinci-king-01
Breaking News Aggregator with SendGrid and PostgreSQL ⚠️ COMMUNITY TEMPLATE DISCLAIMER: This is a community-contributed template that uses ScrapeGraphAI (a community node). Please ensure you have the ScrapeGraphAI community node installed in your n8n instance before using this template. This workflow automatically scrapes multiple government and regulatory websites, extracts the latest policy or compliance-related news, stores the data in PostgreSQL, and instantly emails daily summaries to your team through SendGrid. It is ideal for compliance professionals and industry analysts who need near real-time awareness of regulatory changes impacting their sector. Pre-conditions/Requirements Prerequisites n8n instance (self-hosted or n8n.cloud) ScrapeGraphAI community node installed Operational SendGrid account PostgreSQL database accessible from n8n Basic knowledge of SQL for table creation Required Credentials ScrapeGraphAI API Key** – Enables web scraping and parsing SendGrid API Key** – Sends email notifications PostgreSQL Credentials** – Host, port, database, user, and password Specific Setup Requirements | Resource | Requirement | Example Value | |----------------|------------------------------------------------------------------|------------------------------| | PostgreSQL | Table with columns: id, title, url, source, published_at| news_updates | | Allowed Hosts | Outbound HTTPS access from n8n to target sites & SendGrid endpoint| https://*.gov, https://api.sendgrid.com | | Keywords List | Comma-separated compliance terms to filter results | GDPR, AML, cybersecurity | How it works This workflow automatically scrapes multiple government and regulatory websites, extracts the latest policy or compliance-related news, stores the data in PostgreSQL, and instantly emails daily summaries to your team through SendGrid. It is ideal for compliance professionals and industry analysts who need near real-time awareness of regulatory changes impacting their sector. Key Steps: Schedule Trigger**: Runs once daily (or at any chosen interval). ScrapeGraphAI**: Crawls predefined regulatory URLs and returns structured article data. Code (JS)**: Filters results by keywords and formats them. SplitInBatches**: Processes articles in manageable chunks to avoid timeouts. If Node**: Checks whether each article already exists in the database. PostgreSQL**: Inserts only new articles into the news_updates table. Set Node**: Generates an email-friendly HTML summary. SendGrid**: Dispatches the compiled summary to compliance stakeholders. Set up steps Setup Time: 15-20 minutes Install ScrapeGraphAI Node: From n8n, go to “Settings → Community Nodes → Install”, search “ScrapeGraphAI”, and install. Create PostgreSQL Table: CREATE TABLE news_updates ( id SERIAL PRIMARY KEY, title TEXT, url TEXT UNIQUE, source TEXT, published_at TIMESTAMP ); Add Credentials: Navigate to “Credentials”, add ScrapeGraphAI, SendGrid, and PostgreSQL credentials. Import Workflow: Copy the JSON workflow, paste into “Import from Clipboard”. Configure Environment Variables (optional): REG_NEWS_KEYWORDS, SEND_TO_EMAILS, DB_TABLE_NAME. Set Schedule: Open the Schedule Trigger node and define your preferred cron expression. Activate Workflow: Toggle “Active”, then click “Execute Workflow” once to validate all connections. Node Descriptions Core Workflow Nodes: Schedule Trigger** – Initiates the workflow at the defined interval. ScrapeGraphAI** – Scrapes and parses news listings into JSON. Code** – Filters articles by keywords and normalizes timestamps. SplitInBatches** – Prevents database overload by batching inserts. If** – Determines whether an article is already stored. PostgreSQL** – Executes parameterized INSERT statements. Set** – Builds the HTML email body. SendGrid** – Sends the daily digest email. Data Flow: Schedule Trigger → ScrapeGraphAI → Code → SplitInBatches → If → PostgreSQL → Set → SendGrid Customization Examples Change Keyword Filtering // Code Node snippet const keywords = ['GDPR','AML','SOX']; // Add or remove terms item.filtered = keywords.some(k => item.title.includes(k)); return item; Switch to Weekly Digest { "trigger": { "cronExpression": "0 9 * * 1" // Every Monday at 09:00 } } Data Output Format The workflow outputs structured JSON data: { "title": "Data Privacy Act Amendment Passed", "url": "https://regulator.gov/news/1234", "source": "regulator.gov", "published_at": "2024-06-12T14:30:00Z" } Troubleshooting Common Issues ScrapeGraphAI node not found – Install the community node and restart n8n. Duplicate key error in PostgreSQL – Ensure the url column is marked UNIQUE to prevent duplicates. Emails not sending – Verify SendGrid API key and check account’s daily limit. Performance Tips Limit initial scrape URLs to fewer than 20 to reduce run time. Increase SplitInBatches size only if your database can handle larger inserts. Pro Tips: Use environment variables to manage sensitive credentials securely. Add an Error Trigger node to catch and log failures for auditing purposes. Combine with Slack or Microsoft Teams nodes to push instant alerts alongside email digests.
by Daniel
Transform any website into a structured knowledge repository with this intelligent crawler that extracts hyperlinks from the homepage, intelligently filters images and content pages, and aggregates full Markdown-formatted content—perfect for fueling AI agents or building comprehensive company dossiers without manual effort. 📋 What This Template Does This advanced workflow acts as a lightweight web crawler: it scrapes the homepage to discover all internal links (mimicking a sitemap extraction), deduplicates and validates them, separates image assets from textual pages, then fetches and converts non-image page content to clean Markdown. Results are seamlessly appended to Google Sheets for easy analysis, export, or integration into vector databases. Automatically discovers and processes subpage links from the homepage Filters out duplicates and non-HTTP links for efficient crawling Converts scraped content to Markdown for AI-ready formatting Categorizes and stores images, links, and full content in a single sheet row per site 🔧 Prerequisites Google account with Sheets access for data storage n8n instance (cloud or self-hosted) Basic understanding of URLs and web links 🔑 Required Credentials Google Sheets OAuth2 API Setup Go to console.cloud.google.com → APIs & Services → Credentials Click "Create Credentials" → Select "OAuth client ID" → Choose "Web application" Add authorized redirect URIs: https://your-n8n-instance.com/rest/oauth2-credential/callback (replace with your n8n URL) Download the client ID and secret, then add to n8n as "Google Sheets OAuth2 API" credential type During setup, grant access to Google Sheets scopes (e.g., spreadsheets) and test the connection by listing a sheet ⚙️ Configuration Steps Import the workflow JSON into your n8n instance In the "Set Website" node, update the website_url value to your target site (e.g., https://example.com) Assign your Google Sheets credential to the three "Add ... to Sheet" nodes Update the documentId and sheetName in those nodes to your target spreadsheet ID and sheet name/ID Ensure your sheet has columns: "Website", "Links", "Scraped Content", "Images" Activate the workflow and trigger manually to test scraping 🎯 Use Cases Knowledge base creation: Crawl a company's site to aggregate all content into Sheets, then export to Notion or a vector DB for internal wikis AI agent training: Extract structured Markdown from industry sites to fine-tune LLMs on domain-specific data like legal docs or tech blogs Competitor intelligence: Build dossiers by crawling rival websites, separating assets and text for SEO audits or market analysis Content archiving: Preserve dynamic sites (e.g., news portals) as static knowledge dumps for compliance or historical research ⚠️ Troubleshooting No links extracted: Verify the homepage has tags; test with a simple site like example.com and check HTTP response in executions Sheet update fails: Confirm column names match exactly (case-sensitive) and credential has edit permissions; try a new blank sheet Content truncated: Google Sheets limits cells to ~50k chars—adjust the .slice(0, 50000) in "Add Scraped Content to Sheet" or split into multiple rows Rate limiting errors: Add a "Wait" node after "Scrape Links" with 1-2s delay if the site blocks rapid requests