by Strategiflows
Who Is This For? E-commerce managers, data analysts, and n8n beginners who need a hands-off way to pull all Shopify orders—even stores with thousands of orders—into Google Sheets for reporting or BI. What Problem Does It Solve? Shopify’s GraphQL API only returns up to 250 orders per call, forcing you to manually manage cursors and loops. This template handles the “get next 250” logic for you, so you never miss an order. What This Workflow Does Schedule Trigger – Runs at your chosen cadence (daily, hourly, or manual). Set Date Range – Defines startDay and endDay based on $now. GraphQL Loop – Fetches orders 250 at a time, using pageInfo.hasNextPage and endCursor until complete. Code Node – Flattens orders into line-item rows and summarizes by SKU/vendor. Google Sheets – Appends results to your sheet for easy analysis.
by Viktor Klepikovskyi
Base64 Encode Multiple Binary Files with a Code Node This template demonstrates how to handle multiple binary files in n8n by using a Code node to convert them into a Base64 encoded string. It's particularly useful when an API requires file uploads in this format and the standard 'Extract From File' node is not sufficient for batch processing. The workflow starts by downloading a ZIP file, unzipping it to get multiple binary files, and then uses a Code node with custom JavaScript to encode each file individually. Instructions Download and import this template into your n8n instance. Run the workflow once to see how it downloads, unzips, and then encodes multiple files. Modify the 'HTTP Request' node to download your own binary file or a ZIP file containing multiple files. Update the 'Code' node if you need to adjust the output format or file paths. Use the output of the 'Code' node in a subsequent node, such as another 'HTTP Request' to send the Base64-encoded files to your desired API. A link to the full blog post is available here
by Clown Mutiny
What It Does The Chef Agent is your AI-powered kitchen companion—ready to turn leftover ingredients into meal inspiration. It's a simple, fun n8n automation that: Accepts a list of ingredients via webhook Uses Ollama AI to suggest 5 creative recipes or food ideas Recommends up to 3 missing ingredients to improve the dish Returns a fallback message if the AI is unavailable Includes setup notes for beginners Requirements An active n8n instance (local or hosted) Ollama AI running locally (or another LLM via HTTP request) A webhook endpoint (defaults to /lets-cook) Why You’ll Love It Fully customizable for your use case or favorite LLM Great intro to AI + workflow automation Comes with playful Clown Mutiny flair: > “Powered by Clown Mutiny’s taste-bud liberation division.” Installation Import the provided JSON template into your n8n workspace. Configure your AI node to match your local Ollama instance. Trigger the flow by sending a POST request to the webhook: { "ingredients": "eggs, rice, spinach" }
by Yaron Been
Wan Video Wan 2.2 I2v A14b Video Generator Description Image-to-video at 720p and 480p with Wan 2.2 A14B Overview This n8n workflow integrates with the Replicate API to use the wan-video/wan-2.2-i2v-a14b model. This powerful AI model can generate high-quality video content based on your inputs. Features Easy integration with Replicate API Automated status checking and result retrieval Support for all model parameters Error handling and retry logic Clean output formatting Parameters Required Parameters prompt** (string): Prompt for video generation image** (string): Input image to generate video from Optional Parameters seed** (integer, default: None): Random seed. Leave blank for random num_frames** (integer, default: 81): Number of video frames. 81 frames give the best results resolution** (string, default: 480p): Resolution of video. 832x480px corresponds to 16:9 aspect ratio, and 480x832px is 9:16 sample_shift** (number, default: 5): Sample shift factor sample_steps** (integer, default: 30): Number of generation steps. Fewer steps means faster generation, at the expensive of output quality. 30 steps is sufficient for most prompts frames_per_second** (integer, default: 16): Frames per second. Note that the pricing of this model is based on the video duration at 16 fps How to Use Set up your Replicate API key in the workflow Configure the required parameters for your use case Run the workflow to generate video content Access the generated output from the final node API Reference Model: wan-video/wan-2.2-i2v-a14b API Endpoint: https://api.replicate.com/v1/predictions Requirements Replicate API key n8n instance Basic understanding of video generation parameters
by Yaron Been
Ibm Granite Granite Speech 3.3 8b Text Generator Description Granite-speech-3.3-8b is a compact and efficient speech-language model, specifically designed for automatic speech recognition (ASR) and automatic speech translation (AST). Overview This n8n workflow integrates with the Replicate API to use the ibm-granite/granite-speech-3.3-8b model. This powerful AI model can generate high-quality text content based on your inputs. Features Easy integration with Replicate API Automated status checking and result retrieval Support for all model parameters Error handling and retry logic Clean output formatting Parameters Optional Parameters seed** (integer, default: None): Random seed. Leave blank to randomize the seed. audio** (array, default: None): Audio inputs for the model. top_k** (integer, default: 50): The number of highest probability tokens to consider for generating the output. If > 0, only keep the top k tokens with highest probability (top-k filtering). top_p** (number, default: 0.9): A probability threshold for generating the output. If < 1.0, only keep the top tokens with cumulative probability >= top_p (nucleus filtering). Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751). prompt** (string, default: ): User prompt to send to the model. max_tokens** (integer, default: 512): The maximum number of tokens the model should generate as output. min_tokens** (integer, default: 0): The minimum number of tokens the model should generate as output. temperature** (number, default: 0.6): The value used to modulate the next token probabilities. chat_template** (string, default: None): A template to format the prompt with. If not provided, the default prompt template will be used. system_prompt** (string, default: None): System prompt to send to the model.The chat template provides a good default. How to Use Set up your Replicate API key in the workflow Configure the required parameters for your use case Run the workflow to generate text content Access the generated output from the final node API Reference Model: ibm-granite/granite-speech-3.3-8b API Endpoint: https://api.replicate.com/v1/predictions Requirements Replicate API key n8n instance Basic understanding of text generation parameters
by Yaron Been
Ndreca Hunyuan3d 2 Test AI Generator Description None Overview This n8n workflow integrates with the Replicate API to use the ndreca/hunyuan3d-2-test model. This powerful AI model can generate high-quality other content based on your inputs. Features Easy integration with Replicate API Automated status checking and result retrieval Support for all model parameters Error handling and retry logic Clean output formatting Parameters Required Parameters image** (string): Input image for generating 3D shape Optional Parameters seed** (integer, default: 1234): Random seed for generation steps** (integer, default: 50): Number of inference steps num_chunks** (integer, default: 200000): Number of chunks for mesh generation max_facenum** (integer, default: 40000): Maximum number of faces for mesh generation guidance_scale** (number, default: 5.5): Guidance scale for generation octree_resolution** (string, default: 512): Octree resolution for mesh generation remove_background** (boolean, default: True): Whether to remove background from input image How to Use Set up your Replicate API key in the workflow Configure the required parameters for your use case Run the workflow to generate other content Access the generated output from the final node API Reference Model: ndreca/hunyuan3d-2-test API Endpoint: https://api.replicate.com/v1/predictions Requirements Replicate API key n8n instance Basic understanding of other generation parameters
by Varritech
Workflow: Auto-Ticket Maker ⚡ About the Creators This workflow was created by Varritech Technologies, an innovative agency that leverages AI to engineer, design, and deliver software development projects 500% faster than traditional agencies. Based in New York City, we specialize in custom software development, web applications, and digital transformation solutions. If you need assistance implementing this workflow or have questions about content management solutions, please reach out to our team. 🏗️ Architecture Overview This workflow transforms your Slack conversations into complete project tickets, effectively replacing the need for a dedicated PM for task creation: Slack Webhook → Captures team conversation Code Transformation → Parses Slack message structure AI PM Agent → Analyzes requirements and creates complete tickets Memory Buffer → Maintains conversation context Slack Output → Returns formatted tickets to your channel Say goodbye to endless PM meetings just to create tickets! Simply describe what you need in Slack, and our AI PM handles the rest, breaking down complex projects into structured epics and tasks with all the necessary details. 📦 Node-by-Node Breakdown flowchart LR A[Webhook: Slack Trigger] --> B[Code: Parse Message] B --> C[AI PM Agent] C --> D[Slack: Post Tickets] E[Memory Buffer] --> C F[OpenAI Model] --> C Webhook: Slack Trigger Type: HTTP Webhook (POST /slack-ticket-maker) Purpose: Captures messages from your designated Slack channel. Code Transformation Function: Parses complex Slack payload structure Extracts: User ID, channel, message text, timestamp, thread information AI PM Agent Inputs: Parsed Slack message Process: Evaluates project complexity Requests project name if needed Asks clarifying questions (up to 2 rounds) Breaks down into epics and tasks Formats with comprehensive structure Ticket Structure: Title Description Objectives/Goals Definition of Done Requirements/Acceptance Criteria Implementation Details Risks & Challenges Testing & Validation Timeline & Milestones Related Notes & References Open Questions Memory Buffer Type: Window Buffer Memory Purpose: Maintains context across conversation Slack Output Posts fully-formatted tickets back to your channel Uses markdown for clean, structured presentation 🔍 Design Rationale & Best Practices Replace Your PM's Ticket Creation Time Let your PM focus on strategy while AI handles the documentation. Cut ticket creation time by 90%. Standardized Quality Every ticket follows best practices with consistent structure, detail level, and formatting. No Training Required Describe your needs conversationally - the AI adapts to your communication style. Seamless Integration Works within your existing Slack workflow - no new tools to learn.
by scrapeless official
Brief Overview This automation template helps you track the latest real estate listings from the LoopNet platform. By using Scrapeless to scrape property listings, n8n to orchestrate the workflow, and Google Sheets to store the results, you can build a real estate data pipeline that runs automatically on a weekly schedule. How It Works Trigger on a Schedule:** The workflow runs automatically every week (can be adjusted to every 6 hours, daily, etc.). Scrape Property Listings:** Scrapeless crawls the LoopNet real estate website and returns structured Markdown data. Extract & Parse Content:** JavaScript nodes use regex to parse property titles, links, sizes, year built from Markdown. Flatten Data:** Each property listing becomes a single row with structured fields. Save to Google Sheets:** Property data is appended to your Google Sheet for easy analysis, sharing, and reporting. Features No-code, automated real estate listing scraper. Scrapes and structures the latest commercial property listings (for sale or lease). Saves structured listing data directly to Google Sheets. Fully automated, scheduled scraping—no manual scraping is required. Extensible: Add filters, deduplication, Slack/Email notifications, or multi-city scraping. Requirements Scrapeless API Key:** Sign up on the Scrapeless Dashboard. Go to Settings → API Key Management → Create API Key, then copy the generated key. n8n Instance:** Self-hosted or n8n.cloud account. Google Account:** For Google Sheets API access. Target Site:** This template is configured for LoopNet real estate listings but can be adapted for other property platforms like Crexi. Installation Deploy n8n on your preferred platform. Install the Scrapeless node from the community marketplace. Import this workflow JSON file into your n8n workspace. Create and add your Scrapeless API Key in n8n’s credential manager. Connect your Google Sheets account in n8n. Update the target LoopNet URL and Google Sheet details. Usage This automated real estate scraper is ideal for: | Industry / Role | Use Case | | ---------------------- | ----------------------------------------------------------------- | | Real Estate Agencies | Monitor new commercial properties and streamline lead generation. | | Market Research Teams | Track market dynamics and property availability in real-time. | | BI/Data Analysts | Automate data collection for dashboards and market insights. | | Investors | Keep tabs on the latest commercial property opportunities. | | Automation Enthusiasts | Example use case for learning web scraping + automation. | Output Example
by Salman Mehboob
💡 What this workflow does Type #Audit https://clientsite.com in Slack. Walk away. Get a professional PDF report and a structured Excel fix sheet delivered to Google Drive and posted back in your Slack thread — fully automated, zero manual work. Built for SEO agencies and freelancers who deliver technical audits at scale. 👇 Check out the deliverables: 📄 View Sample PDF Report 📊 View Sample Excel Fix Sheet 📦 What You Get 20–50 page PDF report:** Includes a health score, Core Web Vitals, broken links, On-Page SEO, issue cards with fix explanations, a priority action plan, and an SEO glossary. Excel fix sheet:** 18 tabs (one per issue type), containing every URL and exact fix instructions — ready to hand off to a developer. Cloud Storage:** Both files are saved to your Google Drive /SEO Audits folder automatically. Slack Delivery:** Links to the generated files are posted directly back into your original Slack thread. ⚙️ How It Works A Slack message containing #Audit and a URL triggers the workflow. A POST request starts the crawl via your self-hosted Screaming Frog CLI API (built with Python + FastAPI, exposed publicly through a Cloudflare Tunnel — no static IP needed). The workflow polls the API every 2 minutes until the status is done, timeout, or failed. The crawl ZIP is downloaded and decompressed, triggering two simultaneous branches: Branch A (Executive Report): SEO Audit Parser → PageSpeed Insights → PSI Parser → Report Builder → HTML to PDF → Google Drive. Branch B (Developer Sheet): Full Data Parser → Tab Builder → Excel File Builder → Google Drive. Both Google Drive links are merged, logged to a Google Sheet, and posted back into the Slack thread. 🛠️ Key Technical Features Streaming CSV Parser:** Easily handles large crawl files up to 500MB without causing timeouts or memory crashes. Smart File Optimization:** Auto-skips massive all_inlinks.csv and all_anchor_text.csv files (reduces the unzipped payload from 450MB to 15MB). Advanced 403 Error Logic:** Splits 403 errors into three actionable categories: Your own site blocking the crawler (WAF/Cloudflare) Known bot-blocked platforms (Twitter, Wikipedia) Unknown external links requiring a manual check Archive Page Filtering:** Removes tags, pagination, feeds, and search pages from all issue counts so your data isn't skewed. Dynamic Health Score (0–100):** Algorithmically weighted by critical vs. warning issues. Universal CMS Support:** Works flawlessly with WordPress, Blogger, Shopify, and any other crawlable CMS. 📋 What You Need | Requirement | Notes | | :--- | :--- | | Screaming Frog SEO Spider | Licensed version (£199/year) required | | SF CLI API (Python/FastAPI) | Not included in this template — Contact creator below | | Cloudflare Tunnel | Free — exposes your local API publicly | | Google PageSpeed API Key | Free from Google Cloud Console | | PDF Conversion Service | Workflow is pre-configured for pdfendpoint.com | | Google Drive & Sheets | OAuth2 credentials for file storage and audit logging | | Slack | OAuth2 credentials for the trigger and delivery | | n8n (Self-Hosted) | Recommended for long execution times | ⚠️ Required n8n Environment Variables To ensure the workflow can process large sites without timing out, add these to your n8n .env file: N8N_DEFAULT_BINARY_DATA_MODE=filesystem EXECUTIONS_TIMEOUT=7200 N8N_RUNNERS_TASK_TIMEOUT=7200 NODE_OPTIONS=--max-old-space-size=8192 🚀 Quick Setup API Setup: Deploy the SF CLI API on your Windows machine/server and expose it via Cloudflare Tunnel. Link the Crawler: Update the Start Crawl node URL and Bearer token with your tunnel URL and API secret. PageSpeed Insights: Replace the PSI API key in the Page Speed Insights HTTP node with your own. Cloud Storage: Set your Google Drive Folder ID in both upload nodes, and your Google Sheets Spreadsheet ID in the Append row node. Branding: In the Report Builder code node, update the AGENCY config object (around line 125) with your name, email, and tagline. In the Input Cleaner node, set the watermark value to your agency's name. 📞 Support & Custom API Access Note: The custom Python/FastAPI backend required to orchestrate Screaming Frog is not included in this JSON template. If you want to set up the complete system, please reach out: 📧 Email:salmanmehboob1947@gmail.com 🔗 LinkedIn: Salman Mehboob
by Hemanth Arety
Generate AEO strategy from brand input using AI competitor analysis This workflow automatically creates a comprehensive Answer Engine Optimization (AEO) strategy by identifying your top competitors, analyzing their positioning, and generating custom recommendations to help your brand rank in AI-powered search engines like ChatGPT, Perplexity, and Google SGE. Who it's for This template is perfect for: Digital marketing agencies** offering AEO services to clients In-house marketers** optimizing content for AI search engines Brand strategists** analyzing competitive positioning Content teams** creating AI-optimized content strategies SEO professionals** expanding into Answer Engine Optimization What it does The workflow automates the entire AEO research and strategy process in 6 steps: Collects brand information via a user-friendly web form (brand name, website, niche, product type, email) Identifies top 3 competitors using Google Gemini AI based on product overlap, market position, digital presence, and geographic factors Scrapes target brand website with Firecrawl to extract value propositions, features, and content themes Scrapes competitor websites in parallel to gather competitive intelligence Generates comprehensive AEO strategy using OpenAI GPT-4 with 15+ actionable recommendations Delivers formatted report via email with executive summary, competitive analysis, and implementation roadmap The entire process runs automatically and takes approximately 5-7 minutes to complete. How to set up Requirements You'll need API credentials for: Google Gemini API** (for competitor analysis) - Get API key OpenAI API** (for strategy generation) - Get API key Firecrawl API** (for web scraping) - Get API key Gmail account** (for email delivery) - Use OAuth2 authentication Setup Steps Import the workflow into your n8n instance Configure credentials: Add your Google Gemini API key to the "Google Gemini Chat Model" node Add your OpenAI API key to the "OpenAI Chat Model" node Add your Firecrawl API key as HTTP Header Auth credentials Connect your Gmail account using OAuth2 Activate the workflow and copy the form webhook URL Test the workflow by submitting a real brand through the form Check your email for the generated AEO strategy report Credentials Setup Tips For Firecrawl: Create HTTP Header Auth credentials with header name Authorization and value Bearer YOUR_API_KEY For Gmail: Use OAuth2 to avoid authentication issues with 2FA Test each API credential individually before running the full workflow How it works Competitor Identification The Google Gemini AI agent analyzes your brand based on 4 weighted criteria: product/service overlap (40%), market position (30%), digital presence (20%), and geographic overlap (10%). It returns structured JSON data with competitor names, URLs, overlap percentages, and detailed reasoning. Web Scraping Firecrawl extracts structured data from websites using custom schemas. For each site, it captures: company name, products/services, value proposition, target audience, key features, pricing info, and content themes. This runs asynchronously with 60-second waits to allow for complete extraction. Strategy Generation OpenAI GPT-4 analyzes the combined brand and competitor data to generate a comprehensive report including: executive summary, competitive analysis, 15+ specific AEO tactics across 4 categories (content optimization, structural improvements, authority building, answer engine targeting), content priority matrix with 10 ranked topics, and a detailed implementation roadmap. Email Delivery The strategy is formatted as a professional HTML email with clear sections, visual hierarchy, and actionable next steps. Recipients get an immediately implementable roadmap for improving their AEO performance. How to customize the workflow Change AI Models Replace Google Gemini** with Claude, GPT-4, or other LLM in the competitor analysis node Replace OpenAI** with Anthropic Claude or Google Gemini in the strategy generation node Both use LangChain agent nodes, making model swapping straightforward Modify Competitor Analysis Find more competitors**: Edit the AI prompt to request 5 or 10 competitors instead of 3 Add filtering criteria**: Include factors like company size, funding stage, or geographic focus Change ranking weights**: Adjust the 40/30/20/10 weighting in the prompt Enhance Data Collection Add social media scraping**: Include LinkedIn, Twitter/X, or Facebook page analysis Pull review data**: Integrate G2, Capterra, or Trustpilot APIs for customer sentiment Include traffic data**: Add SimilarWeb or Semrush API calls for competitive metrics Change Output Format Export to Google Docs**: Replace Gmail with Google Docs node to create shareable documents Send to Slack/Discord**: Post strategy summaries to team channels for collaboration Save to database**: Store results in Airtable, PostgreSQL, or MongoDB for tracking Create presentations**: Generate PowerPoint slides using automation tools Add More Features Schedule periodic analysis**: Run monthly competitive audits for specific brands A/B test strategies**: Generate multiple strategies and compare results over time Multi-language support**: Add translation nodes for international brands Custom branding**: Modify email templates with your agency's logo and colors Adjust Scraping Behavior Change Firecrawl schema**: Customize extracted data fields based on industry needs Add timeout handling**: Implement retry logic for failed scraping attempts Scrape more pages**: Extend beyond homepage to include blog, pricing, and about pages Use different scrapers**: Replace Firecrawl with Apify, Browserless, or custom solutions Tips for best results Provide clear brand information**: The more specific the product type and niche, the better the competitor identification Ensure websites are accessible**: Some sites block scrapers; consider adding user agents or rotating IPs Monitor API costs**: Firecrawl and OpenAI charges can add up; set usage limits Review generated strategies**: AI recommendations should be reviewed and customized for your specific context Iterate on prompts**: Fine-tune the AI prompts based on output quality over multiple runs Common use cases Client onboarding** for marketing agencies - Generate initial AEO assessments Content strategy planning** - Identify topics and angles competitors are missing Quarterly audits** - Track competitive positioning changes over time Product launches** - Understand competitive landscape before entering market Sales enablement** - Equip sales teams with competitive intelligence Note: This workflow uses community and AI nodes that require external API access. Make sure your n8n instance can make outbound HTTP requests and has the necessary LangChain nodes installed.
by Marth
Automated AI-Driven Competitor & Market Intelligence System Problem Solved:** Small and Medium-sized IT companies often struggle to stay ahead in a rapidly evolving market. Manually tracking competitor moves, pricing changes, product updates, and emerging market trends is time-consuming, inconsistent, and often too slow for agile sales strategies. This leads to missed sales opportunities, ineffective pitches, and a reactive rather than proactive market approach. Solution Overview:** This n8n workflow automates the continuous collection and AI-powered analysis of competitor data and market trends. By leveraging web scraping, RSS feeds, and advanced AI models, it transforms raw data into actionable insights for your sales and marketing teams. The system generates structured reports, notifies relevant stakeholders, and stores intelligence in your database, empowering your team with real-time, strategic information. For Whom:** This high-value workflow is perfect for: IT Solution Providers & SaaS Companies: To maintain a competitive edge and tailor sales pitches based on competitor weaknesses and market opportunities. Sales & Marketing Leaders: To gain comprehensive, automated market intelligence without extensive manual research. Product Development Teams: To identify market gaps and validate new feature development based on competitive landscapes and customer sentiment. Business Strategists: To inform strategic planning with data-driven insights into industry trends and competitive threats. How It Works (Scope of the Workflow) ⚙️ This system establishes a powerful, automated pipeline for market and competitor intelligence: Scheduled Data Collection: The workflow runs automatically at predefined intervals (e.g., weekly), initiating data retrieval from various online sources. Diverse Information Gathering: It pulls data from competitor websites (pricing, features, blogs via web scraping services), industry news and blogs (via RSS feeds), and potentially other sources. Intelligent Data Preparation: Collected data is aggregated, cleaned, and pre-processed using custom code to ensure it's in an optimal format for AI analysis, removing noise and extracting relevant text. AI-Powered Analysis: An advanced AI model (like OpenAI's GPT-4o) performs in-depth analysis on the cleaned data. It identifies competitor strengths, weaknesses, new offerings, pricing changes, customer sentiment from reviews, emerging market trends, and suggests specific opportunities and threats for your company. Automated Report Generation: The AI's structured insights are automatically populated into a professional Google Docs report using a predefined template, making the intelligence easily digestible for your team. Team Notification: Stakeholders (sales leads, marketing managers) receive automated notifications via Slack (or email), alerting them to the new report and key insights. Strategic Data Storage & Utilization: All analyzed insights are stored in a central database (e.g., PostgreSQL). This builds a historical record for long-term trend analysis and can optionally trigger sub-workflows to generate personalized sales talking points directly relevant to ongoing deals or specific prospects. Setup Steps 🛠️ (Building the Workflow) To implement this sophisticated workflow in your n8n instance, follow these detailed steps: Prepare Your Digital Assets & Accounts: Google Sheet (Optional, if using for CRM data): For simpler CRM, create a sheet with CompetitorName, LastAnalyzedDate, Strengths, Weaknesses, Opportunities, Threats, SalesTalkingPoints. API Keys & Credentials: OpenAI API Key: Essential for the AI analysis. Web Scraping Service API Key: For services like Apify, Crawlbase, or similar (e.g., Bright Data, ScraperAPI). Database Access: Credentials for your PostgreSQL/MySQL database. Ensure you've created necessary tables (competitor_profiles, market_trends) with appropriate columns. Google Docs Credential: To link n8n to your Google Drive for report generation. Create a template Google Doc with placeholders (e.g., {{competitorName}}, {{strengths}}). Slack Credential: For sending team notifications to specific channels. CRM API Key (Optional): If directly integrating with HubSpot, Salesforce, or custom CRM via API. Identify Data Sources for Intelligence: Compile a list of competitor website URLs you want to monitor (e.g., pricing pages, blog sections, news). Identify relevant online review platforms (e.g., G2, Capterra) for competitor products. Gather RSS Feed URLs from key industry news sources, tech blogs, and competitor's own blogs. Define keywords for general market trends or competitor mentions, if using tools that provide RSS feeds (like Google Alerts). Build the n8n Workflow (10 Key Nodes): Start a new workflow in n8n and add the following nodes, configuring their parameters and connections carefully: Cron (Scheduled Analysis Trigger): Set this to trigger daily or weekly at a specific time (e.g., Every Week, At Hour: 0, At Minute: 0). HTTP Request (Fetch Competitor Web Data): Configure this to call your chosen web scraping service's API. Set Method to POST, URL to the service's API endpoint, and build the JSON/Raw Body with the startUrls (competitor websites, review sites) for scraping, including your API Key in Authentication (e.g., Header Auth). RSS Feed (Fetch News & Blog RSS): Add the URLs of competitor blogs and industry news RSS feeds. Merge (Combine Data Sources): Connect inputs from both Fetch Competitor Web Data and Fetch News & Blog RSS. Use Merge By Position. Code (Pre-process Data for AI): Write JavaScript code to iterate through merged items, extract relevant text content, perform basic cleaning (e.g., HTML stripping), and limit text length for AI input. Output should be an array of objects with content, title, url, and source. OpenAI (AI Analysis & Competitor Insights): Select your OpenAI credential. Set Resource to Chat Completion and Model to gpt-4o. In Messages, create a System message defining AI's role and a User message containing the dynamic prompt (referencing {{ $json.map(item => ... ).join('\\n\\n') }} for content, title, url, source) and requesting a structured JSON output for analysis. Set Output to Raw Data. Google Docs (Generate Market Intelligence Report): Select your Google Docs credential. Set Operation to Create document from template. Provide your Template Document ID and map the Values from the parsed AI output (using JSON.parse($json.choices[0].message.content).PropertyName) to your template placeholders. Slack (Sales & Marketing Team Notification): Select your Slack credential. Set Chat ID to your team's Slack channel ID. Compose the Text message, referencing the report link ({{ $json.documentUrl }}) and key AI insights (e.g., {{ JSON.parse($json.choices[0].message.content).Competitor_Name }}). PostgreSQL (Store Insights to Database): Select your PostgreSQL credential. Set Operation to Execute Query. Write an INSERT ... ON CONFLICT DO UPDATE SQL query to store the AI insights into your competitor_profiles or market_trends table, mapping values from the parsed AI output. OpenAI (Generate Personalized Sales Talking Points - Optional Branch): This node can be part of the main workflow or a separate, manually triggered workflow. Configure it similarly to the main AI node, but with a prompt tailored to generate sales talking points based on a specific sales context and the stored insights. Final Testing & Activation: Run a Test: Before going live, manually trigger the workflow from the first node. Carefully review the data at each stage to ensure correct processing and output. Verify that reports are generated, notifications are sent, and data is stored correctly. Activate Workflow: Once testing is complete and successful, activate the workflow in n8n. This system will empower your IT company's sales team with invaluable, data-driven intelligence, enabling them to close more deals and stay ahead in the market.
by franck fambou
Overview This advanced automation workflow enables deep web scraping combined with Retrieval-Augmented Generation (RAG) to transform websites into intelligent, queryable knowledge bases. The system recursively crawls target websites, extracts content, and indexes all data in a vector database for AI conversational access. How the system works Intelligent Web Scraping and RAG Pipeline Recursive Web Scraper - Automatically crawls every accessible page of a target website Data Extraction - Collects text, metadata, emails, links, and PDF documents Supabase Integration - Stores content in PostgreSQL tables for scalability RAG Vectorization - Generates embeddings and stores them for semantic search AI Query Layer - Connects embeddings to an AI chat engine with citations Error Handling - Automatically retriggers failed queries Setup Instructions Estimated setup time: 30-45 minutes Prerequisites Self-hosted n8n instance (v0.200.0 or higher) Supabase account and project (PostgreSQL enabled) OpenAI/Gemini/Claude API key for embeddings and chat Optional: External vector database (Pinecone, Qdrant) Detailed configuration steps Step 1: Supabase configuration Project creation**: New Supabase project with PostgreSQL enabled Generating credentials**: API keys (anon key and service_role key) and connection string Security configuration**: RLS policies according to your access requirements Step 2: Connect Supabase to n8n Configure Supabase node**: Add credentials to n8n Credentials Test connection**: Verify with a simple query Configure PostgreSQL**: Direct connection for advanced operations Step 3: Preparing the database Main tables**: pages: URLs, content, metadata, scraping statuses documents: Extracted and processed PDF files embeddings: Vectors for semantic search links: Link graph for navigation Management functions**: Scripts to reactivate failed URLs and manage retries Step 4: Configuring automation Recursive scraper**: Starting URL, crawling depth, CSS selectors HTTP extraction**: User-Agent, headers, timeouts, and retry policies Supabase backup**: Batch insertion, data validation, duplicate management Step 5: Error handling and re-executions Failure monitoring**: Automatic detection of failed URLs Manual triggers**: Selective re-execution by domain or date Recovery sub-streams**: Retry logic with exponential backoff Step 6: RAG processing Embedding generation**: Text-embedding models with intelligent chunking Vector storage**: Supabase pgvector or external database Conversational engine**: Connection to chat models with source citations Data structure Main Supabase tables | Table | Content | Usage | |-------|---------|-------| | pages | URLs, HTML content, metadata | Main storage for scraped content | | documents | PDF files, extracted text | Downloaded and processed documents | | embeddings | Vectors, text chunks | Semantic search and RAG | | links | Link graph, navigation | Relationships between pages | Use cases Business and enterprise Competitive intelligence with conversational querying Market research from complex web domains Compliance monitoring and regulatory watch Research and academia Literature extraction with semantic search Building datasets from fragmented sources Legal and technical Scraping legal repositories with intelligent queries Technical documentation transformed into a conversational assistant Key features Advanced scraping Recursive crawling with automatic link discovery Multi-format extraction (HTML, PDF, emails) Intelligent error handling and retry Intelligent RAG Contextual embeddings for semantic search Multi-document queries with citations Intuitive conversational interface Performance and scalability Processing of thousands of pages per execution Embedding cache for fast responses Scalable architecture with Supabase Technical Architecture Main flow: Target URL → Recursive scraping → Content extraction → Supabase storage → Vectorization → Conversational interface Supported types: HTML pages, PDF documents, metadata, links, emails Performance specifications Capacity**: 10,000+ pages per run Response time**: < 5 seconds for RAG queries Accuracy**: >90% relevance for specific domains Scalability**: Distributed architecture via Supabase Advanced configuration Customization Crawling depth and scope controls Domain and content type filters Chunking settings to optimize RAG Monitoring Real-time monitoring in Supabase Cost and performance metrics Detailed conversation logs