by vinci-king-01
Software Vulnerability Tracker with Pushover and Notion ⚠️ COMMUNITY TEMPLATE DISCLAIMER: This is a community-contributed template that uses ScrapeGraphAI (a community node). Please ensure you have the ScrapeGraphAI community node installed in your n8n instance before using this template. This workflow automatically scans multiple patent databases on a weekly schedule, filters new filings relevant to selected technology domains, saves the findings to Notion, and pushes instant alerts to your mobile device via Pushover. It is ideal for R&D teams and patent attorneys who need up-to-date insights on emerging technology trends and competitor activity. Pre-conditions/Requirements Prerequisites An n8n instance (self-hosted or n8n cloud) ScrapeGraphAI community node installed Active Notion account with an integration created Pushover account (user key & application token) List of technology keywords / CPC codes to monitor Required Credentials ScrapeGraphAI API Key** – Enables web scraping of patent portals Notion Credential** – Internal Integration Token with database write access Pushover Credential** – App Token + User Key for push notifications Additional Setup Requirements | Service | Needed Item | Where to obtain | |---------|-------------|-----------------| | USPTO, EPO, WIPO, etc. | Public URLs for search endpoints | Free/public | | Notion | Database with properties: Title, Abstract, URL, Date | Create in Notion | | Keyword List | Text file or environment variable PATENT_KEYWORDS | Define yourself | How it works This workflow automatically scans multiple patent databases on a weekly schedule, filters new filings relevant to selected technology domains, saves the findings to Notion, and pushes instant alerts to your mobile device via Pushover. It is ideal for R&D teams and patent attorneys who need up-to-date insights on emerging technology trends and competitor activity. Key Steps: Schedule Trigger**: Fires every week (default Monday 08:00 UTC). Code (Prepare Queries)**: Builds search URLs for each keyword and data source. SplitInBatches**: Processes one query at a time to respect rate limits. ScrapeGraphAI**: Scrapes patent titles, abstracts, links, and publication dates. Code (Normalize & Deduplicate)**: Cleans data, converts dates, and removes already-logged patents. IF Node**: Checks whether new patents were found. Notion Node**: Inserts new patent entries into the specified database. Pushover Node**: Sends a concise alert summarizing the new filings. Sticky Notes**: Document configuration tips inside the workflow. Set up steps Setup Time: 10-15 minutes Install ScrapeGraphAI: In n8n, go to “Settings → Community Nodes” and install @n8n-nodes/scrapegraphai. Add Credentials: ScrapeGraphAI: paste your API key. Notion: add the internal integration token and select your database. Pushover: provide your App Token and User Key. Configure Keywords: Open the first Code node and edit the keywords array (e.g., ["quantum computing", "Li-ion battery", "5G antenna"]). Point to Data Sources: In the same Code node, adjust the sources array if you want to add/remove patent portals. Set Notion Database Mapping: In the Notion node, map properties (Name, Abstract, Link, Date) to incoming JSON fields. Adjust Schedule (optional): Double-click the Schedule Trigger and change the CRON expression to your preferred interval. Test Run: Execute the workflow manually. Confirm that the Notion page is populated and a Pushover notification arrives. Activate: Switch the workflow to “Active” to enable automatic weekly execution. Node Descriptions Core Workflow Nodes: Schedule Trigger** – Defines the weekly execution time. Code (Build Search URLs)** – Dynamically constructs patent search URLs. SplitInBatches** – Sequentially feeds each query to the scraper. ScrapeGraphAI** – Extracts patent metadata from HTML pages. Code (Normalize Data)** – Formats dates, adds UUIDs, and checks for duplicates. IF** – Determines whether new patents exist before proceeding. Notion** – Writes new patent records to your Notion database. Pushover** – Sends real-time mobile/desktop notifications. Data Flow: Schedule Trigger → Code (Build Search URLs) → SplitInBatches → ScrapeGraphAI → Code (Normalize Data) → IF → Notion & Pushover Customization Examples Change Notification Message // Inside the Pushover node "Message" field return { message: 📜 ${items[0].json.count} new patent(s) detected in ${new Date().toDateString()}, title: '🆕 Patent Alert', url: items[0].json.firstPatentUrl, url_title: 'Open first patent' }; Add Slack Notification Instead of Pushover // Replace the Pushover node with a Slack node { text: ${$json.count} new patents published:\n${$json.list.join('\n')}, channel: '#patent-updates' } Data Output Format The workflow outputs structured JSON data: { "title": "Quantum Computing Device", "abstract": "A novel qubit architecture that ...", "url": "https://patents.example.com/US20240012345A1", "publicationDate": "2024-06-01", "source": "USPTO", "keywordsMatched": ["quantum computing"] } Troubleshooting Common Issues No data returned – Verify that search URLs are still valid and the ScrapeGraphAI selector matches the current page structure. Duplicate entries in Notion – Ensure the “Normalize Data” code correctly checks for existing URLs or IDs before insert. Performance Tips Limit the number of keywords or schedule the workflow during off-peak hours to reduce API throttling. Enable caching inside ScrapeGraphAI (if available) to minimize repeated requests. Pro Tips: Use environment variables (e.g., {{ $env.PATENT_KEYWORDS }}) to manage keyword lists without editing nodes. Chain an additional “HTTP Request → ML Model” step to auto-classify patents by CPC codes. Create a Notion view filtered by publicationDate is within past 30 days for quick scanning.
by Dinakar Selvakumar
Description This workflow is Part 2 of the HR Client Acquisition system and builds on the lead discovery pipeline from the previous workflow: 🔗 HR Client Acquisition (Part 1) – Job Lead Discovery & AI Qualification System In Part 1, job leads are discovered and qualified using AI. In this workflow (Part 2), those qualified companies are enriched further by identifying company domains, classifying them as employers or agencies, and extracting decision-maker contacts such as HR and operations leaders. The workflow uses AI, web scraping, and enrichment APIs to transform raw company data into actionable outreach-ready leads with verified contact information. Use cases Recruitment agencies targeting companies actively hiring Sales teams building high-quality outbound lead lists Automating employer research and contact discovery Enriching job-based leads into decision-maker pipelines Requirements Google Sheets account OpenAI API key Apify account LinkFinder AI API key n8n instance with environment variables configured How to use Run Part 1 workflow to collect and qualify job leads Ensure companies are stored in Google Sheets with status = NEW or ENRICHMENT_REQUIRED Execute this workflow Workflow enriches companies and extracts key contacts Contacts are saved into Google Sheets for outreach Customising this workflow Modify AI prompts for company classification Adjust industry filtering logic Change contact selection priorities (HR, operations, leadership) Integrate additional enrichment or CRM tools What this template demonstrates Multi-step lead enrichment pipeline AI-powered company classification Domain extraction and validation Contact discovery using external APIs Structured data handling using Google Sheets How it works • Step 1: Fetch qualified companies from Google Sheets • Step 2: Extract or predict company domain • Step 3: Scrape and analyze website content using AI • Step 4: Classify company as employer or agency • Step 5: Enrich employer data and fetch employee contacts • Step 6: Select top contacts and store results Setup steps • Estimated setup time: 15–25 minutes • Configure API keys (OpenAI, Apify, LinkFinder) • Set environment variables for Google Sheets • Connect output of Part 1 workflow • Run test execution with sample data
by Nguyen Thieu Toan
Monitor Facebook Pages and Analyze Content Safety via Telegram This n8n template automates the collection, storage, and safety analysis of Facebook posts while simultaneously providing an interactive AI assistant on Telegram. If you manage communities or brand pages and need to stay instantly informed about toxic content while having a smart assistant to answer quick operational queries, this workflow is perfect for you. How it works Interactive Chatbot (Trigger):* The Telegram Trigger listens for direct messages. An *AI Agent* (powered by Google Gemini) processes the input using *MongoDB* for conversation memory and custom tools (like *SerpAPI**) for deep research. Data Scraping (Schedule):* A Schedule Trigger runs every 3 hours to fetch the latest posts from your specified Facebook page using the *Apify Facebook Scraper**. Data Normalization & Storage:* Extracted posts are normalized and upserted into an *n8n Data Table**. This prevents duplicate processing of the same posts in future runs. Safety Analysis:** Post text and downloaded images are merged and sent to a secondary AI Agent. The AI evaluates the context and user reactions to flag the content as "Safe" or "Toxic". Smart Notification:** The safety report is beautifully formatted using Telegram HTML and dispatched directly to the admin's Telegram inbox. How to use Connect your Telegram Bot API credentials in both the Telegram Trigger and Send nodes. Connect your Google Gemini API key in all Language Model nodes. Connect your Apify API credentials and SerpAPI key. Configure your MongoDB connection for the chat memory nodes. Create an n8n Data Table (e.g., facebook_news_db) with a postId column (Number) and update the Data Table Upsert node to select your table. Customize the Set Context (Chat) and Set Context (Scraper) nodes with your specific details (Telegram Admin ID, Facebook Page URL, Bot Name). Activate the workflow and let the automation run. Requirements n8n Version:* Built and tested on *n8n 2.9.4+*. *(Note: You may encounter errors on older versions. It is highly recommended to update to the latest n8n version to use this workflow effectively). Google Gemini** API credentials. Telegram Bot** token. Apify** API credentials. SerpAPI** credentials. MongoDB** connection string. An active n8n Data Table. Customizing this workflow Change the scraper:** Swap the Apify node with any other social media scraping tool or RSS feed to monitor different platforms (e.g., X, LinkedIn). Change the database:** Replace the MongoDB Chat Memory node with Postgres or another memory node if you prefer a different database structure. Modify the AI persona:** Update the system prompt in the AI Agent nodes to change the chatbot's tone or the strictness of the safety evaluation. About the Author Created by: Nguyen Thieu Toan (Jay Nguyen) Email: me@nguyenthieutoan.com Website: nguyenthieutoan.com Company: GenStaff (genstaff.net) Socials (Facebook / X / LinkedIn): @nguyenthieutoan More templates: n8n.io/creators/nguyenthieutoan
by Atta
Never guess your SEO strategy again. This advanced workflow automates the most time-consuming part of SEO: auditing competitor articles and identifying exactly where your brand can outshine them. It extracts deep content from top-ranking URLs, compares it against your specific brand identity, and generates a ready-to-use "Action Plan" for your content team. The workflow uses Decodo for high-fidelity scraping, Gemini 2.5 Flash for strategic gap analysis, and Google Sheets as a dynamic "Brand Brain" and reporting dashboard. ✨ Key Features Brand-Centric Auditing:* Unlike generic SEO tools, this engine uses a live Google Sheet containing your *Brand Identity** to find "Content Gaps" specific to your unique value proposition. Automated SERP Itemization:** Converts a simple list of keywords into a filtered list of top-performing competitor URLs. Deep Markdown Extraction:** Uses Decodo Universal to bypass bot-blockers and extract clean Markdown content, preserving headers and structure for high-fidelity AI analysis. Structured Action Plans:** Outputs machine-readable JSON containing the competitor's H1, their "Winning Factor," and a 1-sentence "Checkmate" instruction for your writers. ⚙️ How it Works Data Foundation: The workflow triggers (Manual or Scheduled) and pulls your Global Config (e.g., result limits) and Brand Identity from a dedicated Google Sheet. Market Discovery: It retrieves your target keywords and uses the Decodo Google Search node to identify the top competitors. A Code Node then "itemizes" these results into individual URLs. Intelligence Harvesting: Decodo Universal scrapes each URL, and an HTML 5 node extracts the body content into Markdown format to minimize token noise for the AI. Strategic Audit: The AI Content Auditor (powered by Gemini) receives the competitor’s text and your Brand Identity. It identifies what the competitor missed that your brand excels at. Reporting Deck: The final Strategy Master Writer node appends the analysis—including the "Content Gap" and "Action Plan"—into a master Google Sheet for your marketing team. 📥 Component Installation This workflow relies on the Decodo node for search and scraping precision. Install Node: Click the + button in n8n, search for "Decodo," and add it to your canvas. Credentials: Use your Decodo API key. (Tip: Use a residential proxy setting for difficult sites like Reddit or Stripe). Gemini: Ensure you have the Google Gemini Chat Model node connected to the AI Agent. 🎁 Get a free Web Scraping API subscription here 👉🏻 https://visit.decodo.com/X4YBmy 🛠️ Setup Instructions 1. Google Sheets Configuration Create a spreadsheet with the following three tabs: Target Keywords**: One column named Target Keyword. Brand Identity**: One cell containing your brand mission, USPs, and target audience. Competitor Audit Feed**: Headers for Keyword, URL, Rank, Winning Factor, Content Gap, and Action Plan. Clone the spreadsheet here. 2. Global Configuration In the Config (Set) node, define your serp_results_amount (e.g., 10). This controls how many competitors are analyzed per keyword. ➕ How to Adapt the Template Competitor Exclusion:* Add a *Filter** node after "Market Discovery" to automatically skip domains like amazon.com or reddit.com if they aren't relevant to your niche. Slack Alerts:* Connect a *Slack** node after the AI analysis to notify your content manager immediately when a high-impact "Action Plan" is generated for a priority keyword. Multi-Model Verification:* Swap Gemini with *Claude 3.5 Sonnet* or *GPT-4o** in the Strategic Audit section to compare different AI perspectives on the same competitor content.
by Bhuvanesh R
Your Cold Email is Now Researched. This pipeline finds specific bottlenecks on prospect websites and instantly crafts an irresistible pitch 🎯 Problem Statement Traditional high-volume cold email outreach is stuck on generic personalization (e.g., "Love your website!"). Sales teams, especially those selling high-value AI Receptionists, struggle to efficiently find the one Unique Operational Hook (like manual scheduling dependency or high call volume) needed to make the pitch relevant. This forces reliance on expensive, slow manual research, leading to low reply rates and inefficient spending on bulk outreach tools. ✨ Solution This workflow deploys a resilient Dual-AI Personalization Pipeline that runs on a batch basis. It uses the Filter (Qualified Leads) node as a cost-saving Quality Gate to prevent processing bad leads. It executes a Targeted Deep Dive on successful leads, using GPT-4 for analytical insight extraction and Claude Sonnet for coherent, human-like copy generation. The entire process outputs campaign-ready data directly to Google Sheets and sends a critical QA Draft via Gmail. ⚙️ How It Works (Multi-Step Execution) 1\. Ingestion and Cost Control (The Quality Gate) Trigger and Ingestion:* The workflow starts via a *Manual Trigger, pulling leads directly from **Get All Leads (Google Sheets). Cost Filtering:* The *Filter (Qualified Leads)** node removes leads that lack a working email or website URL. Execution Isolation:* The *Loop Over Leads* node initiates individual processing. The *Capture Lead Data (Set)** node immediately captures and locks down the original lead context for stability throughout the loop. Hybrid Scraping:* The *Scrape Site (HTTP Request)* and *Extract Text & Links (HTML)* nodes execute the *Hybrid Scraping* strategy, simultaneously capturing *website text* and *external links**. Data Shaping & Status:* The *Filter Social & Status (Code)* node is the control center. It filters links, bundles the context, and critically, assigns a *status** of 'Success' or 'Scrape Fail'. Cost Control Branch:* The *If (IF node)* checks this status. Items with 'Scrape Fail' bypass all AI steps (saving *100% of AI token costs) and jump directly to **Log Final Result. Successful items proceed to the AI core. 2\. Dual-AI Coherence & Dispatch (The Executive Output) Analytical Synthesis:* The *Summarize Website (OpenAI)* node uses *GPT-4* to synthesize the full context and extract the *Unique Operational Hook** (e.g., manual booking overhead). Coherent Copy Generation:* The *Generate Subject & Body (Anthropic)* node uses the *Claude Sonnet* model to generate the subject and the multi-line body, guaranteeing *coherence** by creating both simultaneously in a single JSON output. Final Parsing:* The *Parse AI Output (Code)* node reliably strips markdown wrappers and extracts the clean *subject* and *body** strings. Final Delivery:* The data is logged via *Log Final Result (Google Sheets), and the completed email is sent to the user via **Create a draft (Gmail) for final Quality Assurance before sending. 🛠️ Setup Steps Before running the workflow, ensure these credentials and data structures are correctly configured: Credentials Anthropic:** Configure credentials for the Language Model (Claude Sonnet). OpenAI:** Configure credentials for the Analytical Model (GPT-4/GPT-4o). Google Services:* Set up OAuth2 credentials for *Google Sheets* (Input/Output) and *Gmail** (Draft QA and Completion Alert). Configuration Google Sheet Setup:* Your input sheet must include the columns *email, **website\_url, and an empty Icebreaker column for initial filtering. HTTP URL:* Verify that the *Scrape Site** node's URL parameter is set to pull the website URL from the stabilized data structure: ={{ $json.website\_url }}. AI Prompts:** Ensure the Anthropic prompt contains your current Irresistible Sales Offer and the required nested JSON output structure. ✅ Benefits Coherence Guarantee:* A single *Anthropic** node generates both the subject and body, guaranteeing the message is perfectly aligned and hits the same unique insight. Maximum Cost Control:* The *IF node* prevents spending tokens on bad or broken websites, making the campaign highly *budget-efficient**. Deep Personalization:* Combines *website text* and *social media links**, creating an icebreaker that implies thorough, manual research. High Reliability:* Uses robust *Code nodes** for data structuring and parsing, ensuring the workflow runs consistently under real-world conditions without crashing. Zero-Risk QA:* The final *Gmail (Create a draft)** step ensures human review of the generated copy before any cold emails are sent out.
by phil
This workflow is designed for B2B professionals to automatically identify and summarize business opportunities from a company's website. By leveraging Bright Data's Web Unblocker and advanced AI models from OpenRouter, it scrapes relevant company pages ("About Us", "Team", "Contact"), analyzes the content for potential pain points and needs, and synthesizes a concise, actionable report. The final output is formatted for direct use in documents, making it an ideal tool for sales, marketing, and business development teams to prepare for prospecting calls or personalize outreach. Who's it for This template is ideal for: B2B Sales Teams:** Quickly find and qualify leads by identifying specific business needs before a cold call. Marketing Agencies:** Develop personalized content and value propositions based on a prospect's public website information. Business Development Professionals:** Efficiently research potential partners or clients and discover collaboration opportunities. Entrepreneurs:** Gain a competitive edge by understanding a competitor's strategy or a potential client's operations. How it works The workflow is triggered by a chat message, typically a URL from an n8n chat application. It uses Bright Data to scrape the website's sitemap and extract all anchor links from the homepage. An AI agent analyzes the extracted URLs to filter for pages relevant to company information (e.g., "about-us," "team," "contact"). The workflow then scrapes the content of these specific pages. A second AI agent summarizes the content of each page, looking for business opportunities related to AI-powered automation. The summaries are merged and a final AI agent synthesizes them into a single, cohesive report, formatted for easy reading in a Google Doc. How to set up Bright Data Credentials: Sign up for a Bright Data account and create a Web Unblocker zone. In n8n, create new Bright Data API credentials and copy your API key. OpenRouter Credentials: Create an account on OpenRouter and get your API key. In n8n, create new OpenRouter API credentials and paste your key. Chat Trigger Node: Configure the "When chat message received" node. Copy the production webhook URL to integrate with your preferred chat platform. Requirements An active n8n instance. A Bright Data account with a Web Unblocker zone. An OpenRouter account with API access. How to customize this workflow AI Prompting:** Edit the "systemMessage" parameters in the "AI Agent", "AI Agent1", and "AI Agent2" nodes to change the focus of the opportunity analysis. For example, modify the prompts to search for specific technologies, industry jargon, or different types of business challenges. Model Selection:** The workflow uses openai/o4-mini and openai/gpt-5. You can change these to other models available on OpenRouter by editing the model parameter in the OpenRouter Chat Model nodes. Scraping Logic:** The extract url node uses a regular expression to find `` tags. This can be modified or replaced with an HTML Extraction node to target different elements or content on a website. Output Format:** The final output is designed for Google Docs. You can modify the last "AI Agent2" node's prompt to generate the output in a different format, such as a simple JSON object or a markdown list. Phil | Inforeole 🇫🇷 Contactez nous pour automatiser vos processus
by Davide
This workflow automates the entire process of collecting, analyzing, and reporting customer reviews from Feedaty (similar to Trustpilot) using ScrapeGraphAI, transforming raw user feedback into a structured, management-ready reputation report in PDF using new Gemini 3 model and ConvertAPI & Upload to Google Drive. Key Advantages ✅ End-to-End Automation From data collection to final PDF delivery, the entire reputation analysis process is fully automated, eliminating manual scraping, copy-paste work, and reporting overhead. ✅ AI-Driven, Management-Ready Insights The workflow does not just summarize reviews it interprets them strategically, producing insights that are immediately useful for: Management Marketing Customer Support Operations Product & UX teams ✅ Structured & Consistent Reporting Every execution produces reports with the same structure, metrics, and logic, making it ideal for: Periodic reputation monitoring Trend analysis over time Internal performance reviews ✅ Scalable & Configurable Easily adaptable to any Feedaty company profile Page limits and review volume can be adjusted without changing logic Can be scheduled or extended to multiple brands ✅ Data Quality & Compliance No personal data exposure Explicit handling of missing or ambiguous information No assumptions or hallucinated insights Fully transparent and audit-friendly output ✅ Seamless Stakeholder Distribution Automatic upload to Google Drive ensures reports are centralized, shareable, and accessible, with no additional manual steps. Ideal Use Cases Brand & reputation monitoring Customer experience audits Quarterly or monthly executive reports Pre-sales or investor documentation Customer support performance evaluation How it works This workflow automates the entire process of collecting, analyzing, and reporting customer feedback from Feedaty. It starts by scraping live reviews from a specified company's Feedaty page using ScrapeGraphAI, extracting review details like date, rating, and text. Each review is then individually analyzed for sentiment (Positive, Neutral, or Negative) using an AI model. All processed reviews are aggregated and passed to a specialized AI agent that performs a comprehensive company-level reputation analysis, generating a structured management report. Finally, the report is converted into an HTML/PDF format and uploaded to a designated Google Drive folder, creating a fully automated pipeline from data collection to actionable insights delivery. Set up steps Configure Parameters: Set the Feedaty company identifier (e.g., maxisport) and the maximum number of review pages to scrape in the "Set Parameters" node. API Credentials: Ensure the following credentials are configured in n8n: ScrapeGraphAI API (for web scraping) Google Gemini API (for AI sentiment analysis and report generation) Google Drive OAuth2 (for file upload) ConvertAPI (for HTML to PDF conversion) Customize Output: Optionally adjust the "Limit reviews" node to control the number of reviews processed and modify the AI agent's system prompt in "Company Reputation Management" to tailor the report format. Destination Folder: Verify the Google Drive folder ID in the "Upload file" node points to the correct destination for the generated reports. Execution: Trigger the workflow manually via the "When clicking ‘Test workflow’" node to run the complete scraping, analysis, and reporting pipeline. 👉 Subscribe to my new YouTube channel. Here I’ll share videos and Shorts with practical tutorials and FREE templates for n8n. Need help customizing? Contact me for consulting and support or add me on Linkedin.
by Davide
This workflow automatically generates an llms.txt file (following the llmstxt.org specification) for any given website. It uses ScrapegraphAI to crawl and scrape pages, an OpenAI chat model to process content, and finally uploads the generated file via FTP. Key Advantages 1. ✅ Automated llms.txt Generation The workflow fully automates the creation of a compliant llms.txt file, eliminating the need for manual documentation and reducing maintenance time. 2. ✅ AI-Powered Website Understanding Using OpenAI and ScrapeGraphAI, the system intelligently analyzes: Website structure Internal pages Titles and descriptions Content relevance Logical page categorization This produces a high-quality output specifically optimized for AI systems and LLM indexing. 3. ✅ Dynamic Internal Link Discovery The crawler automatically extracts all internal links from the website, making the workflow scalable for: Small business websites Large corporate websites Ecommerce stores Blogs and documentation portals 4. ✅ Intelligent Content Categorization Pages are automatically grouped into meaningful sections such as: Main Pages Services Products Portfolio Blog Company Contact Legal / Optional pages This improves readability and machine interpretability. 5. ✅ Multilingual Support The workflow preserves the original language of the website content, ensuring consistency and localization for international projects. 6. ✅Fully Automated Publishing After generation, the workflow converts the output into a .txt file and uploads it directly to an FTP server or CDN, enabling instant deployment without manual intervention. 7. ✅ Reduced Manual Work* The entire process — from crawling to publishing — is automated inside n8n, significantly reducing operational effort for SEO teams, developers, and AI optimization workflows. 8. ✅ AI & SEO Optimization The generated llms.txt file helps: AI crawlers better understand the website Improve AI discoverability Structure content for LLM consumption Support future AI search indexing strategies 9. ✅ Modular and Scalable Architecture The workflow is built with reusable components: Crawler module Status monitoring AI analysis agent Scraper tool Binary conversion FTP deployment This makes it easy to extend, customize, or integrate into larger automation systems. Ideal Use Cases AI-ready website optimization Automated SEO infrastructure LLM indexing preparation Agency website automation Large-scale multi-site management Documentation platforms AI search visibility enhancement How it works The process begins when the workflow is manually triggered. It then: Starts a crawl of the specified domain using ScrapegraphAI’s smartcrawler. The crawler extracts all internal links from the domain (acting like a sitemap generator). Waits for the crawl to complete (configurable wait time, default 20 units). Checks the crawler’s status – if the crawl is still processing, the workflow waits again; if successful, it proceeds. Extracts the discovered internal links and passes them to an AI agent. Uses an AI agent (with OpenAI GPT) that: Receives the list of internal URLs. Uses a Scraper tool (via ScrapegraphAI) to scrape each URL’s content. Follows a strict prompt to: Analyze the homepage (title, description, language). Extract concise descriptions for each internal page. Group pages into logical sections (Main pages, Services, Portfolio, Contact, Optional, etc.). Generate a clean Markdown file (llms.txt) following the official spec. Converts the Markdown output into a binary file (llms.txt). Uploads the file to an FTP server (configured for BunnyCDN or any FTP storage). Ends the workflow once the upload is complete. The AI agent is explicitly forbidden from inventing content – it must call the Scraper tool for every URL before describing it. The output is pure Markdown, starting with #. Setup steps To use this workflow in n8n, follow these steps: 1. Prerequisites An n8n instance (self-hosted or cloud). A ScrapegraphAI account with API access. An OpenAI account with API key (model used: gpt-5.4-mini – note: this may be a custom/typo; usual models are gpt-4o-mini or gpt-4). An FTP server (traditional FTP, or SFTP if modified). 2. Configure credentials in n8n Go to Credentials in n8n and add: ScrapegraphAI API** Name: ScrapegraphAI account API Key: your ScrapegraphAI API key OpenAI API** Name: OpenAi account (Eure) API Key: your OpenAI API key FTP** Name: FTP BunnyCDN Host, Port, Username, Password (or SSH key) for your FTP server 3. Modify the domain In the Set domain node, change the your_domain to your target domain (e.g., example.com). Do not include https:// – only the domain name. 4. Adjust wait time (optional) In the Wait node, change the amount (default 20) to a higher value if the target site is large or slow to crawl. 5. Update FTP upload path In the Upload to FTP node, update the path field. Currently it is: =/YOUR_PATH/{{$binary.data.fileName}} Change YOUR_PATH to the actual remote directory (e.g., /public_html/). The file will be saved as llms.txt. 6. (Optional) Modify the AI prompt The prompt inside the LLMS.txt Agent node can be adapted for: Different section names Different output structure Different languages Exclusion of certain URL patterns 7. Activate and execute Save the workflow. Toggle Active to enable manual execution. Click ‘Execute workflow’ on the Manual Trigger node. Monitor execution – the workflow will wait for the crawl, then process all pages, and upload the final file. 8. Verify Check your FTP server for the generated llms.txt. Test it by opening in a text editor – it should be pure Markdown starting with # Site name. 👉 Subscribe to my new YouTube channel. Here I’ll share videos and Shorts with practical tutorials and FREE templates for n8n. Need help customizing? Contact me for consulting and support or add me on Linkedin.
by Mariela Slavenova
This template crawls a website from its sitemap, deduplicates URLs in Supabase, scrapes pages with Crawl4AI, cleans and validates the text, then stores content + metadata in a Supabase vector store using OpenAI embeddings. It’s a reliable, repeatable pipeline for building searchable knowledge bases, SEO research corpora, and RAG datasets. ⸻ Good to know • Built-in de-duplication via a scrape_queue table (status: pending/completed/error). • Resilient flow: waits, retries, and marks failed tasks. • Costs depend on Crawl4AI usage and OpenAI embeddings. • Replace any placeholders (API keys, tokens, URLs) before running. • Respect website robots/ToS and applicable data laws when scraping. How it works Sitemap fetch & parse — Load sitemap.xml, extract all URLs. De-dupe — Normalize URLs, check Supabase scrape_queue; insert only new ones. Scrape — Send URLs to Crawl4AI; poll task status until completed. Clean & score — Remove boilerplate/markup, detect content type, compute quality metrics, extract metadata (title, domain, language, length). Chunk & embed — Split text, create OpenAI embeddings. Store — Upsert into Supabase vector store (documents) with metadata; update job status. Requirements • Supabase (Postgres + Vector extension enabled) • Crawl4AI API key (or header auth) • OpenAI API key (for embeddings) • n8n credentials set for HTTP, Postgres/Supabase How to use Configure credentials (Supabase/Postgres, Crawl4AI, OpenAI). (Optional) Run the provided SQL to create scrape_queue and documents. Set your sitemap URL in the HTTP Request node. Execute the workflow (manual trigger) and monitor Supabase statuses. Query your documents table or vector store from your app/RAG stack. Potential Use Cases This automation is ideal for: Market research teams collecting competitive data Content creators monitoring web trends SEO specialists tracking website content updates Analysts gathering structured data for insights Anyone needing reliable, structured web content for analysis Need help customizing? Contact me for consulting and support: LinkedIn
by Growth AI
SEO Content Generation Workflow - n8n Template Instructions Who's it for This workflow is designed for SEO professionals, content marketers, digital agencies, and businesses who need to generate optimized meta tags, H1 headings, and content briefs at scale. Perfect for teams managing multiple clients or large keyword lists who want to automate competitor analysis and SEO content creation while maintaining quality and personalization. How it works The workflow automates the entire SEO content creation process by analyzing your target keywords against top competitors, then generating optimized meta elements and comprehensive content briefs. It uses AI-powered analysis combined with real competitor data to create SEO-friendly content that's tailored to your specific business context. The system processes keywords in batches, performs Google searches, scrapes competitor content, analyzes heading structures, and generates personalized SEO content using your company's database information for maximum relevance. Requirements Required Services and Credentials Google Sheets API**: For reading configuration and updating results Anthropic API**: For AI content generation (Claude Sonnet 4) OpenAI API**: For embeddings and vector search Apify API**: For Google search results Firecrawl API**: For competitor website scraping Supabase**: For vector database (optional but recommended) Template Spreadsheet Copy this template spreadsheet and configure it with your information: Template Link How to set up Step 1: Copy and Configure Template Make a copy of the template spreadsheet Fill in the Client Information sheet: Client name: Your company or client's name Client information: Brief business description URL: Website address Supabase database: Database name (prevents AI hallucination) Tone of voice: Content style preferences Restrictive instructions: Topics or approaches to avoid Complete the SEO sheet with your target pages: Page: Page you're optimizing (e.g., "Homepage", "Product Page") Keyword: Main search term to target Awareness level: User familiarity with your business Page type: Category (homepage, blog, product page, etc.) Step 2: Import Workflow Import the n8n workflow JSON file Configure all required API credentials in n8n: Google Sheets OAuth2 Anthropic API key OpenAI API key Apify API key Firecrawl API key Supabase credentials (if using vector database) Step 3: Test Configuration Activate the workflow Send your Google Sheets URL to the chat trigger Verify that all sheets are readable and credentials work Test with a single keyword row first Workflow Process Overview Phase 0: Setup and Configuration Copy template spreadsheet Configure client information and SEO parameters Set up API credentials in n8n Phase 1: Data Input and Processing Chat trigger receives Google Sheets URL System reads client configuration and SEO data Filters valid keywords and empty H1 fields Initiates batch processing Phase 2: Competitor Research and Analysis Searches Google for top 10 results per keyword Scrapes first 5 competitor websites Extracts heading structures (H1-H6) Analyzes competitor meta tags and content organization Phase 3: Meta Tags and H1 Generation AI analyzes keyword context and competitor data Accesses client database for personalization Generates optimized meta title (65 chars max) Creates compelling meta description (165 chars max) Produces user-focused H1 (70 chars max) Phase 4: Content Brief Creation Analyzes search intent percentages Develops content strategy based on competitor analysis Creates detailed MECE page structure Suggests rich media elements Provides writing recommendations and detail level scoring Phase 5: Data Integration and Updates Combines all generated content into unified structure Updates Google Sheets with new SEO elements Preserves existing data while adding new content Continues batch processing for remaining keywords How to customize the workflow Adjusting AI Models Replace Anthropic Claude with other LLM providers Modify system prompts for different content styles Adjust character limits for meta elements Modifying Competitor Analysis Change number of competitors analyzed (currently 5) Adjust scraping parameters in Firecrawl nodes Modify heading extraction logic in JavaScript nodes Customizing Output Format Update Google Sheets column mapping in Code node Modify structured output parser schema Change batch processing size in Split in Batches node Adding Quality Controls Insert validation nodes between phases Add error handling and retry logic Implement content quality scoring Extending Functionality Add keyword research capabilities Include image optimization suggestions Integrate social media content generation Connect to CMS platforms for direct publishing Best Practices Test with small batches before processing large keyword lists Monitor API usage and costs across all services Regularly update system prompts based on output quality Maintain clean data in your Google Sheets template Use descriptive node names for easier workflow maintenance Troubleshooting API Errors**: Check credential configuration and usage limits Scraping Failures**: Firecrawl nodes have error handling enabled Empty Results**: Verify keyword formatting and competitor availability Sheet Updates**: Ensure proper column mapping in final Code node Processing Stops**: Check batch processing limits and timeout settings
by Growth AI
SEO Content Generation Workflow (Basic Version) - n8n Template Instructions Who's it for This workflow is designed for SEO professionals, content marketers, digital agencies, and businesses who need to generate optimized meta tags, H1 headings, and content briefs at scale. Perfect for teams managing multiple clients or large keyword lists who want to automate competitor analysis and SEO content creation without the complexity of vector databases. How it works The workflow automates the entire SEO content creation process by analyzing your target keywords against top competitors, then generating optimized meta elements and comprehensive content briefs. It uses AI-powered analysis combined with real competitor data to create SEO-friendly content that's tailored to your specific business context. The system processes keywords in batches, performs Google searches, scrapes competitor content, analyzes heading structures, and generates personalized SEO content using your company information for maximum relevance. Requirements Required Services and Credentials Google Sheets API**: For reading configuration and updating results Anthropic API**: For AI content generation (Claude Sonnet 4) Apify API**: For Google search results Firecrawl API**: For competitor website scraping Template Spreadsheet Copy this template spreadsheet and configure it with your information: Template Link How to set up Step 1: Copy and Configure Template Make a copy of the template spreadsheet Fill in the Client Information sheet: Client name: Your company or client's name Client information: Brief business description URL: Website address Tone of voice: Content style preferences Restrictive instructions: Topics or approaches to avoid Complete the SEO sheet with your target pages: Page: Page you're optimizing (e.g., "Homepage", "Product Page") Keyword: Main search term to target Awareness level: User familiarity with your business Page type: Category (homepage, blog, product page, etc.) Step 2: Import Workflow Import the n8n workflow JSON file Configure all required API credentials in n8n: Google Sheets OAuth2 Anthropic API key Apify API key Firecrawl API key Step 3: Test Configuration Activate the workflow Send your Google Sheets URL to the chat trigger Verify that all sheets are readable and credentials work Test with a single keyword row first Workflow Process Overview Phase 0: Setup and Configuration Copy template spreadsheet Configure client information and SEO parameters Set up API credentials in n8n Phase 1: Data Input and Processing Chat trigger receives Google Sheets URL System reads client configuration and SEO data Filters valid keywords and empty H1 fields Initiates batch processing Phase 2: Competitor Research and Analysis Searches Google for top 10 results per keyword using Apify Scrapes first 5 competitor websites using Firecrawl Extracts heading structures (H1-H6) from competitor pages Analyzes competitor meta tags and content organization Processes markdown content to identify heading hierarchies Phase 3: Meta Tags and H1 Generation AI analyzes keyword context and competitor data using Claude Incorporates client information for personalization Generates optimized meta title (65 characters maximum) Creates compelling meta description (165 characters maximum) Produces user-focused H1 (70 characters maximum) Uses structured output parsing for consistent formatting Phase 4: Content Brief Creation Analyzes search intent percentages (informational, transactional, navigational) Develops content strategy based on competitor analysis Creates detailed MECE page structure with H2 and H3 sections Suggests rich media elements (images, videos, infographics, tables) Provides writing recommendations and detail level scoring (1-10 scale) Ensures SEO optimization while maintaining user relevance Phase 5: Data Integration and Updates Combines all generated content into unified structure Updates Google Sheets with new SEO elements Preserves existing data while adding new content Continues batch processing for remaining keywords Key Differences from Advanced Version This basic version focuses on core SEO functionality without additional complexity: No Vector Database**: Removes Supabase integration for simpler setup Streamlined Architecture**: Fewer dependencies and configuration steps Essential Features Only**: Core competitor analysis and content generation Faster Setup**: Reduced time to deployment Lower Costs**: Fewer API services required How to customize the workflow Adjusting AI Models Replace Anthropic Claude with other LLM providers in the agent nodes Modify system prompts for different content styles or languages Adjust character limits for meta elements in the structured output parser Modifying Competitor Analysis Change number of competitors analyzed (currently 5) by adding/removing Scrape nodes Adjust scraping parameters in Firecrawl nodes for different content types Modify heading extraction logic in JavaScript Code nodes Customizing Output Format Update Google Sheets column mapping in the final Code node Modify structured output parser schema for different data structures Change batch processing size in Split in Batches node Adding Quality Controls Insert validation nodes between workflow phases Add error handling and retry logic to critical nodes Implement content quality scoring mechanisms Extending Functionality Add keyword research capabilities with additional APIs Include image optimization suggestions Integrate social media content generation Connect to CMS platforms for direct publishing Best Practices Setup and Testing Always test with small batches before processing large keyword lists Monitor API usage and costs across all services Regularly update system prompts based on output quality Maintain clean data in your Google Sheets template Content Quality Review generated content before publishing Customize system prompts to match your brand voice Use descriptive node names for easier workflow maintenance Keep competitor analysis current by running regularly Performance Optimization Process keywords in small batches to avoid timeouts Set appropriate retry policies for external API calls Monitor workflow execution times and optimize bottlenecks Troubleshooting Common Issues and Solutions API Errors Check credential configuration in n8n settings Verify API usage limits and billing status Ensure proper authentication for each service Scraping Failures Firecrawl nodes have error handling enabled to continue on failures Some websites may block scraping - this is normal behavior Check if competitor URLs are accessible and valid Empty Results Verify keyword formatting in Google Sheets Ensure competitor websites contain the expected content structure Check if meta tags are properly formatted in system prompts Sheet Update Errors Ensure proper column mapping in final Code node Verify Google Sheets permissions and sharing settings Check that target sheet names match exactly Processing Stops Review batch processing limits and timeout settings Check for errors in individual nodes using execution logs Verify all required fields are populated in input data Template Structure Required Sheets Client Information: Business details and configuration SEO: Target keywords and page information Results Sheet: Where generated content will be written Expected Columns Keywords**: Target search terms Description**: Brief page description Type de page**: Page category Awareness level**: User familiarity level title, meta-desc, h1, brief**: Generated output columns This streamlined version provides all essential SEO content generation capabilities while being easier to set up and maintain than the advanced version with vector database integration.
by Avkash Kakdiya
How it works This workflow enriches and personalizes your lead profiles by integrating HubSpot contact data, scraping social media information, and using AI to generate tailored outreach emails. It streamlines the process from contact capture to sending a personalized email — all automatically. The system fetches new or updated HubSpot contacts, verifies and enriches their Twitter/LinkedIn data via Phantombuster, merges the profile and engagement insights, and finally generates a customized email ready for outreach. Step-by-step 1. Trigger & Input HubSpot Contact Webhook: Fires when a contact is created or updated in HubSpot. Fetch Contact: Pulls the full contact details (email, name, company, and social profiles). Update Google Sheet: Logs Twitter/LinkedIn usernames and marks their tracking status. 2. Validation Validate Twitter/LinkedIn Exists: Checks if the contact has a valid social profile before proceeding to scraping. 3. Social Media Scraping (via Phantombuster) Launch Profile Scraper & 🎯 Launch Tweet Scraper: Triggers Phantombuster agents to fetch profile details and recent tweets. Wait Nodes: Ensures scraping completes (30–60 seconds). Fetch Profile/Tweet Results: Retrieves output files from Phantombuster. Extract URL: Parses the job output to extract the downloadable .json or .csv data file link. 4. Data Download & Parsing Download Profile/Tweet Data: Downloads scraped JSON files. Parse JSON: Converts the raw file into structured data for processing. 5. Data Structuring & Merging Format Profile Fields: Maps stats like bio, followers, verified status, likes, etc. Format Tweet Fields: Captures tweet data and associates it with the lead’s email. Merge Data Streams: Combines tweet and profile datasets. Combine All Data: Produces a single, clean object containing all relevant lead details. 6. AI Email Generation & Delivery Generate Personalized Email: Feeds the merged data into OpenAI GPT (via LangChain) to craft a custom HTML email using your brand details. Parse Email Content: Cleans AI output into structured subject and body fields. Sends Email: Automatically delivers the personalized email to the lead via Gmail. Benefits Automated Lead Enrichment — Combines CRM and real-time social media data with zero manual research. Personalized Outreach at Scale — AI crafts unique, relevant emails for each contact. Improved Engagement Rates — Targeted messages based on actual social activity and profile details. Seamless Integration — Works directly with HubSpot, Google Sheets, Gmail, and Phantombuster. Time & Effort Savings — Replaces hours of manual lookup and email drafting with an end-to-end automated flow.