by Ian Kerins
Overview This n8n template automates the process of scraping job listings from Indeed, parsing the data into a structured format, and saving it to Google Sheets for easy tracking. It also includes a Slack notification system to alert you when new jobs are found. Built with ScrapeOps, it handles the complexities of web scraping - such as proxy rotation, anti-bot bypassing, and HTML parsing - so you can focus on the data. Who is this for? Job Seekers**: Automate your daily job search and get instant alerts for new postings. Recruiters & HR Agencies**: Track hiring trends and find new leads for candidate placement. Sales & Marketing Teams**: Monitor companies that are hiring to identify growth signals and lead opportunities. Data Analysts**: Gather labor market data for research and competitive analysis. What problems it solves Manual Searching**: Eliminates the need to manually refresh Indeed and copy-paste job details. Data Structure**: Converts messy HTML into clean, organized rows in a spreadsheet. Blocking & Captchas**: Uses ScrapeOps residential proxies to bypass Indeed's anti-bot protections reliably. Missed Opportunities**: Automated scheduling ensures you are the first to know about new listings. How it works Trigger: The workflow runs on a schedule (default: every 6 hours). Configuration: You define your search query (e.g., "Software Engineer") in the Set Search URL node. Scraping: The ScrapeOps Proxy API fetches the Indeed search results page using a residential proxy to avoid detection. Parsing: The ScrapeOps Parser API takes the raw HTML and extracts key details like Job Title, Company, Location, Salary, and URL. Filtering: A code node filters out invalid results and structures the data. Storage: Valid jobs are appended to a Google Sheet. Notification: A message is sent to Slack confirming the update. Setup steps (~ 10-15 minutes) ScrapeOps Account: Register for a free ScrapeOps API Key. In n8n, open the ScrapeOps nodes and create a new credential with your API key. Google Sheets: Duplicate this Google Sheet Template. Open the Save to Google Sheets node. Connect your Google account and select your duplicated sheet. Slack Setup: Open the Send a message node. Connect your Slack account and select the channel where you want to receive alerts. Customize Search: Open the Set Search URL node. Update the search_query value to the job title or keyword you want to track. Pre-conditions An active ScrapeOps account (Free tier available). A Google Cloud account with Google Sheets API enabled (for n8n connection). A Slack workspace for notifications. Disclaimer This template uses ScrapeOps as a community node. You are responsible for complying with Indeed's Terms of Use, robots directives, and applicable laws in your jurisdiction. Scraping targets may change at any time; adjust render/scroll/wait settings and parsers as needed. Use responsibly for legitimate business purposes. Resources ScrapeOps n8n Overview ScrapeOps Proxy API Documentation ScrapeOps Parser API Documentation
by vinci-king-01
Certification Requirement Tracker with Rocket.Chat and GitLab ⚠️ COMMUNITY TEMPLATE DISCLAIMER: This is a community-contributed template that uses ScrapeGraphAI (a community node). Please ensure you have the ScrapeGraphAI community node installed in your n8n instance before using this template. This workflow automatically monitors websites of certification bodies and industry associations, detects changes in certification requirements, commits the updated information to a GitLab repository, and notifies a Rocket.Chat channel. Ideal for professionals and compliance teams who must stay ahead of annual updates and renewal deadlines. Pre-conditions/Requirements Prerequisites Running n8n instance (self-hosted or n8n.cloud) ScrapeGraphAI community node installed and active Rocket.Chat workspace (self-hosted or cloud) GitLab account and repository for documentation Publicly reachable URL for incoming webhooks (use n8n tunnel, Ngrok, or a reverse proxy) Required Credentials ScrapeGraphAI API Key** – Enables scraping of certification pages Rocket.Chat Access Token & Server URL** – To post update messages GitLab Personal Access Token** – With api and write_repository scopes Specific Setup Requirements | Item | Example Value | Notes | | ------------------------------ | ------------------------------------------ | ----- | | GitLab Repo | gitlab.com/company/cert-tracker | Markdown files will be committed here | | Rocket.Chat Channel | #certification-updates | Receives update alerts | | Certification Source URLs file | /data/sourceList.json in the repository | List of URLs to scrape | How it works This workflow automatically monitors websites of certification bodies and industry associations, detects changes in certification requirements, commits the updated information to a GitLab repository, and notifies a Rocket.Chat channel. Ideal for professionals and compliance teams who must stay ahead of annual updates and renewal deadlines. Key Steps: Webhook Trigger**: Fires on a scheduled HTTP call (e.g., via cron) or manual trigger. Code (Prepare Source List)**: Reads/constructs a list of certification URLs to scrape. ScrapeGraphAI**: Fetches HTML content and extracts requirement sections. Merge**: Combines newly scraped data with the last committed snapshot. IF Node**: Determines if a change occurred (hash/length comparison). GitLab**: Creates a branch, commits updated Markdown/JSON files, and opens an MR (optional). Rocket.Chat**: Posts a message summarizing changes and linking to the GitLab diff. Respond to Webhook**: Returns a JSON summary to the requester (useful for monitoring or chained automations). Set up steps Setup Time: 20-30 minutes Install Community Node: In n8n UI, go to Settings → Community Nodes and install @n8n/community-node-scrapegraphai. Create Credentials: a. ScrapeGraphAI – paste your API key. b. Rocket.Chat – create a personal access token (Personal Access Tokens → New Token) and configure credentials. c. GitLab – create PAT with api + write_repository scopes and add to n8n. Clone the Template: Import this workflow JSON into your n8n instance. Edit StickyNote: Replace placeholder URLs with actual certification-source URLs or point to a repo file. Configure GitLab Node: Set your repository, default branch, and commit message template. Configure Rocket.Chat Node: Select credential, channel, and message template (markdown supported). Expose Webhook: If self-hosting, enable n8n tunnel or configure reverse proxy to make the webhook public. Test Run: Trigger the workflow manually; verify GitLab commit/MR and Rocket.Chat notification. Automate: Schedule an external cron (or n8n Cron node) to POST to the webhook yearly, quarterly, or monthly as needed. Node Descriptions Core Workflow Nodes: stickyNote** – Human-readable instructions/documentation embedded in the flow. webhook** – Entry point; accepts POST /cert-tracker requests. code (Prepare Source List)** – Generates an array of URLs; can pull from GitLab or an environment variable. scrapegraphAi** – Scrapes each URL and extracts certification requirement sections using CSS/XPath selectors. merge (by key)** – Joins new data with previous snapshot for change detection. if (Changes?)** – Branches logic based on whether differences exist. gitlab** – Creates/updates files and opens merge requests containing new requirements. rocketchat** – Sends formatted update to designated channel. respondToWebhook** – Returns 200 OK with a JSON summary. Data Flow: webhook → code → scrapegraphAi → merge → if if (true) → gitlab → rocketchat if (false) → respondToWebhook Customization Examples Change Scraping Frequency // Replace external cron with n8n Cron node { "nodes": [ { "name": "Cron", "type": "n8n-nodes-base.cron", "parameters": { "schedule": { "hour": "0", "minute": "0", "dayOfMonth": "1" } } } ] } Extend Notification Message // Rocket.Chat node → Message field const diffUrl = $json["gitlab_diff_url"]; const count = $json["changes_count"]; return :bell: ${count} Certification Requirement Update(s)\n\nView diff: ${diffUrl}; Data Output Format The workflow outputs structured JSON data: { "timestamp": "2024-05-15T12:00:00Z", "changesDetected": true, "changesCount": 3, "gitlab_commit_sha": "a1b2c3d4", "gitlab_diff_url": "https://gitlab.com/company/cert-tracker/-/merge_requests/42", "notifiedChannel": "#certification-updates" } Troubleshooting Common Issues ScrapeGraphAI returns empty results – Verify your CSS/XPath selectors and API key quota. GitLab commit fails (401 Unauthorized) – Ensure PAT has api and write_repository scopes and is not expired. Performance Tips Limit the number of pages scraped per run to avoid API rate limits. Cache last-scraped HTML in an S3 bucket or database to reduce redundant requests. Pro Tips: Use GitLab CI to auto-deploy documentation site whenever new certification files are merged. Enable Rocket.Chat threading to keep discussions organized per update. Tag stakeholders in Rocket.Chat messages with @cert-team for instant visibility.
by Harvex AI
AI Lead Enrichment & Notification System This n8n template automates the lead enrichment process for your business. Once a lead fills out a form, the workflow scrapes their website, provides a summary of their business, and logs everything into a CRM before notifying your team on Slack. Some use cases: "Speed-to-Lead" optimization, lead enrichment, automated prospect research. How it works Ingestion: A lead submits their details (Name, Email, Website) via a form. Intelligent scraping: The workflow scrapes the provided URL. AI Analysis: OpenAI's model (GPT-4o) analyzes the extracted data and determines whether there is enough info or if the workflow needs to scrape the "About Us" page. CRM Sync: The CRM (Airtable) is updated with the enriched data. Notification: An instant Slack notification is sent to the team channel. How to use the workflow Configure the form: Open the trigger form and input the required fields. Setup OpenAI: Ensure that your credentials are connected. Database mapping: Ensure your Airtable base has the following columns: Name, Website, AI Insight, Email, and Date. Slack setup: Specify the desired Slack channel for notifications. Test it out! Open the form, enter sample data (with a real website), and watch the system enrich the lead for you. Requirements OpenAI API Key** (For analyzing website content and generating summaries) Airtable** (For CRM and logging) Slack** (For team notifications)
by Intuz
This n8n template from Intuz provides a complete solution to automate on-demand lead generation. It acts as a powerful scraping agent that takes a simple chat query, scours both Google Search and Google Maps for relevant businesses, scrapes their websites for contact details, and compiles an enriched lead list directly in Google Sheets. Who's this workflow for? Sales Development Representatives (SDRs) Local Marketing Agencies Business Development Teams Freelancers & Consultants Market Researchers How it works 1. Start with a Chat Query: The user initiates the workflow by typing a search query (e.g., "dentists in New York") into a chat interface. 2. Multi-Source Search: The workflow queries both the Google Custom Search API (for web results across multiple pages) and scrapes Google Maps (for local businesses) to gather a broad list of potential leads. 3. Deep Dive Website Scraping: For each unique business website found, the workflow visits the URL to scrape the raw HTML content of the page. 4. Intelligent Contact Extraction: Using custom code, it then parses the scraped website content to find and extract valuable contact information like email addresses, phone numbers, and social media links. 5. Deduplicate and Log to Sheets: Before saving, the workflow checks your Google Sheet to ensure the lead doesn't already exist. All unique, newly enriched leads are then appended as clean rows to your sheet, along with the original search query for tracking. Key Requirements to Use This Template 1. n8n Instance & Required Nodes: An active n8n account (Cloud or self-hosted). This workflow uses the official n8n LangChain integration (@n8n/n8n-nodes-langchain) for the chat trigger. If you are using a self-hosted version of n8n, please ensure this package is installed. 2. Google Custom Search API: A Google Cloud Project with the "Custom Search API" enabled. You will need an API Key for this service. You must also create a Programmable Search Engine and get its Search engine ID (cx). This tells Google what to search (e.g., the whole web). 3. Google Sheets Account: A Google account and a pre-made Google Sheet with columns for Business Name, Primary Email, Contact Number, URL, Description, Socials, and Search Query. Setup Instructions 1. Configure the Chat Trigger: In the "When chat message received" node, you can find the Direct URL or Embed code to use the chat interface. Set Up Google Custom Search API (Crucial Step): Go to the "Custom Google Search API" (HTTP Request) node. Under "Query Parameters", you must replace the placeholder values for key (with your API Key) and cx (with your Search Engine ID). 3. Configure Google Sheets: In all Google Sheets nodes (Append row in sheet, Get row(s) in sheet, etc.), connect your Google Sheets credentials. Select your target spreadsheet (Document ID) and the specific sheet (Sheet Name) where you want to store the leads. 4. Activate the Workflow: Save the workflow and toggle the "Active" switch to ON. Open the chat URL and enter a search query to start generating leads. Connect with us Website: https://www.intuz.com/services Email: getstarted@intuz.com LinkedIn: https://www.linkedin.com/company/intuz Get Started: https://n8n.partnerlinks.io/intuz For Custom Workflow Automation Click here- Get Started
by Onur
🔍 Extract Competitor SERP Rankings from Google Search to Sheets with Scrape.do This template requires a self-hosted n8n instance to run. A complete n8n automation that extracts competitor data from Google search results for specific keywords and target countries using Scrape.do SERP API, and saves structured results into Google Sheets for SEO, competitive analysis, and market research. 📋 Overview This workflow provides a lightweight competitor analysis solution that identifies ranking websites for chosen keywords across different countries. Ideal for SEO specialists, content strategists, and digital marketers who need structured SERP insights without manual effort. Who is this for? SEO professionals tracking keyword competitors Digital marketers conducting market analysis Content strategists planning based on SERP insights Business analysts researching competitor positioning Agencies automating SEO reporting What problem does this workflow solve? Eliminates manual SERP scraping Processes multiple keywords across countries Extracts structured data (position, title, URL, description) Automates saving results into Google Sheets Ensures repeatable & consistent methodology ⚙️ What this workflow does Manual Trigger → Starts the workflow manually Get Keywords from Sheet → Reads keywords + target countries from a Google Sheet URL Encode Keywords → Converts keywords into URL-safe format Process Keywords in Batches → Handles multiple keywords sequentially to avoid rate limits Fetch Google Search Results → Calls Scrape.do SERP API to retrieve raw HTML of Google SERPs Extract Competitor Data from HTML → Parses HTML into structured competitor data (top 10 results) Append Results to Sheet → Writes structured SERP results into a Google Sheet 📊 Output Data Points | Field | Description | Example | |--------------------|------------------------------------------|-------------------------------------------| | Keyword | Original search term | digital marketing services | | Target Country | 2-letter ISO code of target region | US | | position | Ranking position in search results | 1 | | websiteTitle | Page title from SERP result | Digital Marketing Software & Tools | | websiteUrl | Extracted website URL | https://www.hubspot.com/marketing | | websiteDescription | Snippet/description from search results | Grow your business with HubSpot’s tools… | ⚙️ Setup Prerequisites n8n instance (self-hosted) Google account with Sheets access Scrape.do* account with *SERP API token** Google Sheet Structure This workflow uses one Google Sheet with two tabs: Input Tab: "Keywords" | Column | Type | Description | Example | |----------|------|-------------|---------| | Keyword | Text | Search query | digital marketing | | Target Country | Text | 2-letter ISO code | US | Output Tab: "Results" | Column | Type | Description | Example | |--------------------|-------|-------------|---------| | Keyword | Text | Original search term | digital marketing | | position | Number| SERP ranking | 1 | | websiteTitle | Text | Title of the page | Digital Marketing Software & Tools | | websiteUrl | URL | Website/page URL | https://www.hubspot.com/marketing | | websiteDescription | Text | Snippet text | Grow your business with HubSpot’s tools | 🛠 Step-by-Step Setup Import Workflow: Copy the JSON → n8n → Workflows → + Add → Import from JSON Configure **Scrape.do API**: Endpoint: https://api.scrape.do/ Parameter: token=YOUR_SCRAPEDO_TOKEN Add render=true for full HTML rendering Configure Google Sheets: Create a sheet with two tabs: Keywords (input), Results (output) Set up Google Sheets OAuth2 credentials in n8n Replace placeholders: YOUR_GOOGLE_SHEET_ID and YOUR_GOOGLE_SHEETS_CREDENTIAL_ID Run & Test: Add test data in Keywords tab Execute workflow → Check results in Results tab 🧰 How to Customize Add more fields**: Extend HTML parsing logic in the “Extract Competitor Data” node to capture extra data (e.g., domain, sitelinks). Filtering**: Exclude domains or results with custom rules. Batch Size**: Adjust “Process Keywords in Batches” for speed vs. rate-limits. Rate Limiting: Insert a **Wait node (e.g., 10–30 seconds) if API rate limits apply. Multi-Sheet Output**: Save per-country or per-keyword results into separate tabs. 📊 Use Cases SEO Competitor Analysis**: Identify top-ranking sites for target keywords Market Research**: See how SERPs differ by region Content Strategy**: Analyze titles & descriptions of competitor pages Agency Reporting**: Automate competitor SERP snapshots for clients 📈 Performance & Limits Single Keyword: ~10–20 seconds (depends on **Scrape.do response) Batch of 10**: 3–5 minutes typical Large Sets (50+)**: 20–40 minutes depending on API credits & batching API Calls: 1 **Scrape.do request per keyword Reliability**: 95%+ extraction success, 98%+ data accuracy 🧩 Troubleshooting API error** → Check YOUR_SCRAPEDO_TOKEN and API credits No keywords loaded** → Verify Google Sheet ID & tab name = Keywords Permission denied** → Re-authenticate Google Sheets OAuth2 in n8n Empty results** → Check parsing logic and verify search term validity Workflow stops early** → Ensure batching loop (SplitInBatches) is properly connected 🤝 Support & Community n8n Forum: https://community.n8n.io n8n Docs: https://docs.n8n.io Scrape.do Dashboard: https://dashboard.scrape.do 🎯 Final Notes This workflow provides a repeatable foundation for extracting competitor SERP rankings with Scrape.do and saving them to Google Sheets. You can extend it with filtering, richer parsing, or integration with reporting dashboards to create a fully automated SEO intelligence pipeline.
by vinci-king-01
Job Posting Aggregator with Email and GitHub ⚠️ COMMUNITY TEMPLATE DISCLAIMER: This is a community-contributed template that uses ScrapeGraphAI (a community node). Please ensure you have the ScrapeGraphAI community node installed in your n8n instance before using this template. This workflow automatically aggregates certification-related job-posting requirements from multiple industry sources, compares them against last year’s data stored in GitHub, and emails a concise change log to subscribed professionals. It streamlines annual requirement checks and renewal reminders, ensuring users never miss an update. Pre-conditions/Requirements Prerequisites n8n instance (self-hosted or n8n cloud) ScrapeGraphAI community node installed Git installed (for optional local testing of the repo) Working SMTP server or other Email credential supported by n8n Required Credentials ScrapeGraphAI API Key** – Enables web scraping of certification pages GitHub Personal Access Token** – Allows the workflow to read/write files in the repo Email / SMTP Credentials** – Sends the summary email to end-users Specific Setup Requirements | Resource | Purpose | Example | |----------|---------|---------| | GitHub Repository | Stores certification_requirements.json versioned annually | https://github.com/<you>/cert-requirements.git | | Watch List File | List of page URLs & selectors to scrape | Saved in the repo under /config/watchList.json | | Email List | Semicolon-separated list of recipients | me@company.com;team@company.com | How it works This workflow automatically aggregates certification-related job-posting requirements from multiple industry sources, compares them against last year’s data stored in GitHub, and emails a concise change log to subscribed professionals. It streamlines annual requirement checks and renewal reminders, ensuring users never miss an update. Key Steps: Manual Trigger**: Starts the workflow on demand or via scheduled cron. Load Watch List (Code Node)**: Reads the list of certification URLs and CSS selectors. Split In Batches**: Iterates through each URL to avoid rate limits. ScrapeGraphAI**: Scrapes requirement details from each page. Merge (Wait)**: Reassembles individual scrape results into a single JSON array. GitHub (Read File)**: Retrieves last year’s certification_requirements.json. IF (Change Detector)**: Compares current vs. previous JSON and decides whether changes exist. Email Send**: Composes and sends a formatted summary of changes. GitHub (Upsert File)**: Commits the new JSON file back to the repo for future comparisons. Set up steps Setup Time: 15-25 minutes Install Community Node: From n8n UI → Settings → Community Nodes → search and install “ScrapeGraphAI”. Create/Clone GitHub Repo: Add an empty certification_requirements.json ( {} ) and a config/watchList.json with an array of objects like: [ { "url": "https://cert-body.org/requirements", "selector": "#requirements" } ] Generate GitHub PAT: Scope repo, store in n8n Credentials as “GitHub API”. Add ScrapeGraphAI Credential: Paste your API key into n8n Credentials. Configure Email Credentials: E.g., SMTP with username/password or OAuth2. Open Workflow: Import the template JSON into n8n. Update Environment Variables (in the Code node or via n8n variables): GITHUB_REPO (e.g., user/cert-requirements) EMAIL_RECIPIENTS Test Run: Trigger manually. Verify email content and GitHub commit. Schedule: Add a Cron node (optional) for yearly or quarterly automatic runs. Node Descriptions Core Workflow Nodes: Manual Trigger** – Initiates the workflow manually or via external schedule. Code (Load Watch List)** – Reads and parses watchList.json from GitHub or static input. SplitInBatches** – Controls request concurrency to avoid scraping bans. ScrapeGraphAI** – Extracts requirement text using provided CSS selectors or XPath. Merge (Combine)** – Waits for all batches and merges them into one dataset. GitHub (Read/Write File)** – Handles version-controlled storage of JSON data. IF (Change Detector)** – Compares hashes/JSON diff to detect updates. EmailSend** – Sends change log, including renewal reminders and diff summary. Sticky Note** – Provides in-workflow documentation for future editors. Data Flow: Manual Trigger → Code (Load Watch List) → SplitInBatches SplitInBatches → ScrapeGraphAI → Merge Merge → GitHub (Read File) → IF (Change Detector) IF (True) → Email Send → GitHub (Upsert File) Customization Examples Adjusting Scraper Configuration // Inside the Watch List JSON object { "url": "https://new-association.com/cert-update", "selector": ".content article:nth-of-type(1) ul" } Custom Email Template // In Email Send node → HTML Content 📋 Certification Updates – {{ $json.date }} The following certifications have new requirements: {{ $json.diffHtml }} For full details visit our GitHub repo. Data Output Format The workflow outputs structured JSON data: { "timestamp": "2024-09-01T12:00:00Z", "source": "watchList.json", "current": { "AWS-SAA": "Version 3.0, requires renewed proctored exam", "PMP": "60 PDUs every 3 years" }, "previous": { "AWS-SAA": "Version 2.0", "PMP": "60 PDUs every 3 years" }, "changes": { "AWS-SAA": "Updated to Version 3.0; exam format changed." } } Troubleshooting Common Issues ScrapeGraphAI returns empty data – Check CSS/XPath selectors and ensure page is publicly accessible. GitHub authentication fails – Verify PAT scope includes repo and that the credential is linked in both GitHub nodes. Performance Tips Limit SplitInBatches size to 3-5 URLs when sources are heavy to avoid timeouts. Enable n8n execution mode “Queue” for long-running scrapes. Pro Tips: Store selector samples in comments next to each watch list entry for future maintenance. Use a Cron node set to “0 0 1 1 *” for an annual run exactly on Jan 1st. Add a Telegram node after Email Send for instant mobile notifications.
by Alejandro Scuncia
An extendable RAG template to build powerful, explainable AI assistants — with query understanding, semantic metadata, and support for free-tier tools like Gemini, Gemma and Supabase. Description This workflow helps you build smart, production-ready RAG agents that go far beyond basic document Q&A. It includes: ✅ File ingestion and chunking ✅ Asynchronous LLM-powered enrichment ✅ Filterable metadata-based search ✅ Gemma-based query understanding and generation ✅ Cohere re-ranking ✅ Memory persistence via Postgres Everything is modular, low-cost, and designed to run even with free-tier LLMs and vector databases. Whether you want to build a chatbot, internal knowledge assistant, documentation search engine, or a filtered content explorer — this is your foundation. ⚙️ How It Works This workflow is divided into 3 pipelines: 📥 Ingestion Upload a PDF via form Extract text and chunk it for embedding Store in Supabase vector store using Google Gemini embeddings 🧠 Enrichment (Async) Scheduled task fetches new chunks Each chunk is enriched with LLM metadata (topics, use_case, risks, audience level, summary, etc.) Metadata is added to the vector DB for improved retrieval and filtering 🤖 Agent Chat A user question triggers the RAG agent Query Builder transforms it into keywords and filters Vector DB is queried and reranked The final answer is generated using only retrieved evidence, with references Chat memory is managed via Postgres 🌟 Key Features Asynchronous enrichment** → Save tokens, batch process with free-tier LLMs like Gemma Metadata-aware** → Improved filtering and reranking Explainable answers** → Agent cites sources and sections Chat memory** → Persistent context with Postgres Modular design** → Swap LLMs, rerankers, vector DBs, and even enrichment schema Free to run** → Built with Gemini, Gemma, Cohere, Supabase (free tier-compatible) 🔐 Required Credentials |Tool|Use| |-|-|-| |Supabase w/ PostreSQL|Vector DB + storage| |Google Gemini/Gemma|Embeddings & LLM| |Cohere API|Re-ranking| |PostgreSQL|Chat memory| 🧰 Customization Tips Swap extractFromFile with Notion/Google Drive integrations Extend Metadata Obtention prompt to fit your domain (e.g., financial, legal) Replace LLMs with OpenAI, Mistral, or Ollama Replace Postgre Chat Memory with Simple Memory or any other Use a webhook instead of a form to automate ingestion Connect to Telegram/Slack UI with a few extra nodes 💡 Use Cases Company knowledge base bot (internal docs, SOPs) Educational assistant with smart filtering (by topic or level) Legal or policy assistant that cites source sections Product documentation Q&A with multi-language support Training material assistant that highlights risks/examples Content Generation 🧠 Who It’s For Indie developers building smart chatbots AI consultants prototyping Q&A assistants Teams looking for an internal knowledge agent Anyone building affordable, explainable AI tools 🚀 Try It Out! Deploy a modular RAG assistant using n8n, Supabase, and Gemini — fully customizable and almost free to run. 1. 📁 Prepare Your PDFs Use any internal documents, manuals, or reports in *PDF *format. Optional: Add Google Drive integration to automate ingestion. 2. 🧩 Set Up Supabase Create a free Supabase project Use the table creation queries included in the workflow to set up your schema. Add your *supabaseUrl *and *supabaseKey *in your n8n credentials. > 💡 Pro Tip: Make sure you match the embedding dimensions to your model. This workflow uses Gemini text-embedding-04 (768-dim) — if switching to OpenAI, change your table vector size to 1536. 3. 🧠 Connect Gemini & Gemma Use Gemini/Gemma for embeddings and optional metadata enrichment. Or deploy locally for lightweight async LLM processing (via Ollama/HuggingFace). 4. ⚙️ Import the Workflow in n8n Open n8n (self-hosted or cloud). Import the workflow file and paste your credentials. You’re ready to ingest, enrich, and query your document base. 💬 Have Feedback or Ideas? I’d Love to Hear This project is open, modular, and evolving — just like great workflows should be :). If you’ve tried it, built on top of it, or have suggestions for improvement, I’d genuinely love to hear from you. Let’s share ideas, collaborate, or just connect as part of the n8n builder community. 📧 ascuncia.es@gmail.com 🔗 Linkedin
by Onur
🏠 Extract Zillow Property Data to Google Sheets with Scrape.do This template requires a self-hosted n8n instance to run. A complete n8n automation that extracts property listing data from Zillow URLs using Scrape.do web scraping API, parses key property information, and saves structured results into Google Sheets for real estate analysis, market research, and property tracking. 📋 Overview This workflow provides a lightweight real estate data extraction solution that pulls property details from Zillow listings and organizes them into a structured spreadsheet. Ideal for real estate professionals, investors, market analysts, and property managers who need automated property data collection without manual effort. Who is this for? Real estate investors tracking properties Market analysts conducting property research Real estate agents monitoring listings Property managers organizing data Data analysts building real estate databases What problem does this workflow solve? Eliminates manual copy-paste from Zillow Processes multiple property URLs in bulk Extracts structured data (price, address, zestimate, etc.) Automates saving results into Google Sheets Ensures repeatable & consistent data collection ⚙️ What this workflow does Manual Trigger → Starts the workflow manually Read Zillow URLs from Google Sheets → Reads property URLs from a Google Sheet Scrape Zillow URL via Scrape.do → Fetches full HTML from Zillow (bypasses PerimeterX protection) Parse Zillow Data → Extracts structured property information from HTML Write Results to Google Sheets → Saves parsed data into a results sheet 📊 Output Data Points | Field | Description | Example | |-------|-------------|---------| | URL | Original Zillow listing URL | https://www.zillow.com/homedetails/... | | Price | Property listing price | $300,000 | | Address | Street address | 8926 Silver City | | City | City name | San Antonio | | State | State abbreviation | TX | | Days on Zillow | How long listed | 5 | | Zestimate | Zillow's estimated value | $297,800 | | Scraped At | Timestamp of extraction | 2025-01-29T12:00:00.000Z | ⚙️ Setup Prerequisites n8n instance (self-hosted) Google account with Sheets access Scrape.do account with API token (Get 1000 free credits/month) Google Sheet Structure This workflow uses one Google Sheet with two tabs: Input Tab: "Sheet1" | Column | Type | Description | Example | |--------|------|-------------|---------| | URLs | URL | Zillow listing URL | https://www.zillow.com/homedetails/123... | Output Tab: "Results" | Column | Type | Description | Example | |--------|------|-------------|---------| | URL | URL | Original listing URL | https://www.zillow.com/homedetails/... | | Price | Text | Property price | $300,000 | | Address | Text | Street address | 8926 Silver City | | City | Text | City name | San Antonio | | State | Text | State code | TX | | Days on Zillow | Number | Days listed | 5 | | Zestimate | Text | Estimated value | $297,800 | | Scraped At | Timestamp | When scraped | 2025-01-29T12:00:00.000Z | 🛠 Step-by-Step Setup Import Workflow: Copy the JSON → n8n → Workflows → + Add → Import from JSON Configure Scrape.do API: Sign up at Scrape.do Dashboard Get your API token In HTTP Request node, replace YOUR_SCRAPE_DO_TOKEN with your actual token The workflow uses super=true for premium residential proxies (10 credits per request) Configure Google Sheets: Create a new Google Sheet Add two tabs: "Sheet1" (input) and "Results" (output) In Sheet1, add header "URLs" in cell A1 Add Zillow URLs starting from A2 Set up Google Sheets OAuth2 credentials in n8n Replace YOUR_SPREADSHEET_ID with your actual Google Sheet ID Replace YOUR_GOOGLE_SHEETS_CREDENTIAL_ID with your credential ID Run & Test: Add 1-2 test Zillow URLs in Sheet1 Click "Execute workflow" Check results in Results tab 🧰 How to Customize Add more fields**: Extend parsing logic in "Parse Zillow Data" node to capture additional data (bedrooms, bathrooms, square footage) Filtering**: Add conditions to skip certain properties or price ranges Rate Limiting**: Insert a Wait node between requests if processing many URLs Error Handling**: Add error branches to handle failed scrapes gracefully Scheduling**: Replace Manual Trigger with Schedule Trigger for automated daily/weekly runs 📊 Use Cases Investment Analysis**: Track property prices and zestimates over time Market Research**: Analyze listing trends in specific neighborhoods Portfolio Management**: Monitor properties for sale in target areas Competitive Analysis**: Compare similar properties across locations Lead Generation**: Build databases of properties matching specific criteria 📈 Performance & Limits Single Property**: ~5-10 seconds per URL Batch of 10**: 1-2 minutes typical Large Sets (50+)**: 5-10 minutes depending on Scrape.do credits API Calls**: 1 Scrape.do request per URL (10 credits with super=true) Reliability**: 95%+ success rate with premium proxies 🧩 Troubleshooting | Problem | Solution | |---------|----------| | API error 400 | Check your Scrape.do token and credits | | URL showing "undefined" | Verify Google Sheet column name is "URLs" (capital U) | | No data parsed | Check if Zillow changed their HTML structure | | Permission denied | Re-authenticate Google Sheets OAuth2 in n8n | | 50000 character error | Verify Parse Zillow Data code is extracting fields, not returning raw HTML | | Price shows HTML/CSS | Update price extraction regex in Parse Zillow Data node | 🤝 Support & Community Scrape.do Documentation Scrape.do Dashboard Scrape.do Zillow Scraping Guide n8n Forum n8n Docs 🎯 Final Notes This workflow provides a repeatable foundation for extracting Zillow property data with Scrape.do and saving to Google Sheets. You can extend it with: Historical tracking (append timestamps) Price change alerts (compare with previous scrapes) Multi-platform scraping (Redfin, Realtor.com) Integration with CRM or reporting dashboards Important: Scrape.do handles all anti-bot bypassing (PerimeterX, CAPTCHAs) automatically with rotating residential proxies, so you only pay for successful requests. Always use super=true parameter for Zillow to ensure high success rates.
by Oneclick AI Squad
This n8n workflow helps users easily discover nearby residential construction projects by automatically scraping and analyzing property listings from 99acres and other real estate platforms. Users can send an email with their location preferences and receive a curated list of available properties with detailed information, including pricing, area, possession dates, and construction status. Good to know The workflow focuses specifically on residential construction projects and active developments Property data is scraped in real-time to ensure the most current information Results are automatically formatted and structured for easy reading The system handles multiple property formats and data variations from different sources Fallback mechanisms ensure reliable data extraction even when website structures change How it works Trigger: New Email** - Detects incoming emails with property search requests and extracts location preferences from email content Extract Area & City** - Parses the email body to identify target areas (e.g., Gota, Ahmedabad) and falls back to city-level search if specific area is not mentioned Scrape Construction Projects** - Performs web scraping on 99acres and other property websites based on the extracted area and city information Parse Project Listings** - Cleans and formats the scraped HTML data into structured project entries with standardized fields Format Project Details** - Transforms all parsed projects into a consistent email-ready list format with bullet points and organized information Send Results to User** - Delivers a professionally formatted email with the complete list of matching construction projects to the original requester Email Format Examples Input Email Format To: properties@yourcompany.com Subject: Property Search Request Hi, I am interested in buying a flat. Can you please send me the list of available properties in Gota, Ahmedabad? Output Email Example Subject: 🏘️ Property Search Results: 4 Projects Found in Gota, Ahmedabad 🏘️ Available Construction Projects in Gota, Ahmedabad Search Area: Gota, Ahmedabad Total Projects: 4 Search Date: August 4, 2025 📋 PROJECT LISTINGS: 🔷 Project 1 🏠 Name: Vivaan Oliver offers 🏢 BHK: 3 BHK 💰 Price: N/A 📐 Area: 851.0 Sq.Ft 🗓️ Possession: August 2025 📊 Status: under construction 📍 Location: Thaltej, Ahmedabad West 🕒 Scraped Date: 2025-08-04 🔷 Project 2 🏠 Name: Vivaan Oliver offers 🏢 BHK: 3 BHK 💰 Price: Price on Request 📐 Area: 891 Sq Ft 🗓️ Possession: N/A 📊 Status: Under Construction 📍 Location: Thaltej, Ahmedabad West 🕒 Scraped Date: 2025-08-04 🔷 Project 3 🏠 Name: It offers an exclusive range of 🏢 BHK: 3 BHK 💰 Price: N/A 📐 Area: 250 Sq.Ft 🗓️ Possession: 0 2250 📊 Status: Under Construction 📍 Location: Thaltej, Ahmedabad West 🕒 Scraped Date: 2025-08-04 🔷 Project 4 🏠 Name: N/A 🏢 BHK: 2 BHK 💰 Price: N/A 📐 Area: N/A 🗓️ Possession: N/A 📊 Status: N/A 📍 Location: Thaltej, Ahmedabad West 💡 Next Steps: • Contact builders directly for detailed pricing and floor plans • Schedule site visits to shortlisted properties • Verify possession timelines and construction progress • Compare amenities and location advantages 📞 For more information or specific requirements, reply to this email. How to use Setup Instructions Import the workflow into your n8n instance Configure Email Credentials: Set up email trigger for incoming property requests Set up SMTP credentials for sending property listings Configure Web Scraping: Ensure proper headers and user agents for 99acres access Set up fallback mechanisms for different property websites Test the workflow with sample property search emails Sending Property Search Requests Send an email to your configured property search address Include location details in natural language (e.g., "Gota, Ahmedabad") Optionally specify preferences like BHK, budget, or amenities Receive detailed property listings within minutes Requirements n8n instance** (cloud or self-hosted) with web scraping capabilities Email account** with IMAP/SMTP access for automated communication Reliable internet connection** for real-time property data scraping Valid target websites** (99acres, MagicBricks, etc.) access Troubleshooting No properties found**: Verify area spelling and check if the location has active listings Scraping errors**: Update user agents and headers if websites block requests Duplicate results**: Implement better deduplication logic based on property names and locations Email parsing issues**: Test with various email formats and improve regex patterns Website structure changes**: Implement fallback parsers and regular monitoring of scraping success rates
by Rakin Jakaria
Who this is for This workflow is for freelancers, job seekers, or service providers who want to automatically apply to businesses by scraping their website information, extracting contact details, and sending personalized job application emails with AI-powered content — all from one form submission. What this workflow does This workflow starts every time someone submits the Job Applier Form. It then: Scrapes the target business website** to gather company information and contact details. Converts HTML content** to readable markdown format for better AI processing. Extracts email addresses* and creates a company summary using *GPT-5 AI**. Validates email addresses** to ensure they contain proper formatting (@ symbol check). Accesses your experience data* from a connected *Google Sheet** with your skills and portfolio. Generates personalized application emails* (subject + body) using *GPT-5** based on the job position and company info. Sends the application email* automatically via *Gmail** with your name as sender. Provides confirmation** through a completion form showing the AI's response. Setup To set this workflow up: Form Trigger – Customize the job application form fields (Target Business Website, Applying As dropdown with positions like Video Editor, SEO Expert, etc.). OpenAI GPT-5 – Add your OpenAI API credentials for both AI models used in the workflow. Google Sheets – Connect your sheet containing your working experience data, skills, and portfolio information. Gmail Account – Link your Gmail account for sending application emails automatically. Experience Data – Update the Google Sheet with your relevant skills, experience, and achievements for each job type. Sender Name – Modify the sender name in Gmail settings (currently set to "Jamal Mia"). How to customize this workflow to your needs Add more job positions to the dropdown menu (currently includes Video Editor, SEO Expert, Full-Stack Developer, Social Media Manager). Modify the AI prompt to reflect your unique value proposition and application style. Enhance email validation with additional checks like domain verification or email format patterns. Add follow-up scheduling to automatically send reminder emails after a certain period. Include attachment functionality to automatically attach your resume or portfolio to applications. Switch to different email providers or add multiple sender accounts for variety.
by Wolf Bishop
A reliable, no-frills web scraper that extracts content directly from websites using their sitemaps. Perfect for content audits, migrations, and research when you need straightforward HTML extraction without external dependencies. How It Works This streamlined workflow takes a practical approach to web scraping by leveraging XML sitemaps and direct HTTP requests. Here's how it delivers consistent results: Direct Sitemap Processing: The workflow starts by fetching your target website's XML sitemap and parsing it to extract all available page URLs. This eliminates guesswork and ensures comprehensive coverage of the site's content structure. Robust HTTP Scraping: Each page is scraped using direct HTTP requests with realistic browser headers that mimic legitimate web traffic. The scraper includes comprehensive error handling and timeout protection to handle various website configurations gracefully. Intelligent Content Extraction: The workflow uses sophisticated JavaScript parsing to extract meaningful content from raw HTML. It automatically identifies page titles through multiple methods (title tags, Open Graph metadata, H1 headers) and converts HTML structure into readable text format. Framework Detection: Built-in detection identifies whether sites use WordPress, Divi themes, or heavy JavaScript frameworks. This helps explain content extraction quality and provides valuable insights about the site's technical architecture. Rich Metadata Collection: Each scraped page includes detailed metadata like word count, HTML size, response codes, and technical indicators. This data is formatted into comprehensive markdown files with YAML frontmatter for easy analysis and organization. Respectful Rate Limiting: The workflow includes a 3-second delay between page requests to respect server resources and avoid overwhelming target websites. The processing is sequential and controlled to maintain ethical scraping practices. Detailed Success Reporting: Every scraped page generates a report showing extraction success, potential issues (like JavaScript dependencies), and technical details about the site's structure and framework. Setup Steps Configure Google Drive Integration Connect your Google Drive account in the "Save to Google Drive" node Replace YOUR_GOOGLE_DRIVE_CREDENTIAL_ID with your actual Google Drive credential ID Create a dedicated folder for your scraped content in Google Drive Copy the folder ID from the Google Drive URL (the long string after /folders/) Replace YOUR_GOOGLE_DRIVE_FOLDER_ID_HERE with your actual folder ID in both the folderId field and cachedResultUrl Update YOUR_FOLDER_NAME_HERE with your folder's actual name Set Your Target Website In the "Set Sitemap URL" node, replace https://yourwebsitehere.com/page-sitemap.xml with your target website's sitemap URL Common sitemap locations include /sitemap.xml, /page-sitemap.xml, or /sitemap_index.xml Tip: Not sure where your sitemap is? Use a free online tool like https://seomator.com/sitemap-finder Verify the sitemap URL loads correctly in your browser before running the workflow Update Workflow IDs (Automatic) When you import this workflow, n8n will automatically generate new IDs for YOUR_WORKFLOW_ID_HERE, YOUR_VERSION_ID_HERE, YOUR_INSTANCE_ID_HERE, and YOUR_WEBHOOK_ID_HERE No manual changes needed for these placeholders Adjust Processing Limits (Optional) The "Limit URLs (Optional)" node is currently disabled for full site scraping Enable this node and set a smaller number (like 5-10) for initial testing For large websites, consider running in batches to manage processing time and storage Customize Rate Limiting (Optional) The "Wait Between Pages" node is set to 3 seconds by default Increase the delay for more respectful scraping of busy sites Decrease only if you have permission and the target site can handle faster requests Test Your Configuration Enable the "Limit URLs (Optional)" node and set it to 3-5 pages for testing Click "Test workflow" to verify the setup works correctly Check your Google Drive folder to confirm files are being created with proper content Review the generated markdown files to assess content extraction quality Run Full Extraction Disable the "Limit URLs (Optional)" node for complete site scraping Execute the workflow and monitor the execution log for any errors Large websites may take considerable time to process completely (plan for several hours for sites with hundreds of pages) Review Results Each generated file includes technical metadata to help you assess extraction quality Look for indicators like "Limited Content" warnings for JavaScript-heavy pages Files include word counts and framework detection to help you understand the site's structure Framework Compatibility: This scraper is specifically designed to work well with WordPress sites, Divi themes, and many JavaScript-heavy frameworks. The intelligent content extraction handles dynamic content effectively and provides detailed feedback about framework detection. While some single-page applications (SPAs) that render entirely through JavaScript may have limited content extraction, most modern websites including those built with popular CMS platforms will work excellently with this scraper. Important Notes: Always ensure you have permission to scrape your target website and respect their robots.txt guidelines. The workflow includes respectful delays and error handling, but monitor your usage to maintain ethical scraping practices.RetryClaude can make mistakes. Please double-check responses.
by WeblineIndia
📊 Generate Weekly Energy Consumption Reports with API, Email and Google Drive This workflow automates the process of retrieving energy consumption data, formatting it into a CSV report, and distributing it every week via email and Google Drive. ⚡ Quick Implementation Steps: Import the workflow into your n8n instance. Configure your API, email details and Google Drive folder. (Optional) Adjust the CRON schedule if you need a different time or frequency. Activate workflow—automated weekly reports begin immediately. 🎯 Who’s It For Energy providers, sustainability departments, facility managers, renewable energy operators. 🛠 Requirements n8n instance Energy Consumption API access Google Drive account Email SMTP access ⚙️ How It Works Workflow triggers every Monday at 8 AM, fetches consumption data, emails CSV report and saves a copy to Google Drive. 🔄 Workflow Steps 1. Schedule Weekly (Mon 8:00 AM) Type: Cron Node Runs every Monday at 8:00 AM. Triggers the workflow execution automatically. 2. Fetch Energy Data Type: HTTP Request Node Makes a GET request to: https://api.energidataservice.dk/dataset/ConsumptionDE35Hour (sample API) The API returns JSON data with hourly electricity consumption in Denmark. Sample Response Structure: { "records": [ { "HourDK": "2025-08-25T01:00:00", "MunicipalityNo": _, "MunicipalityName": "Copenhagen", "ConsumptionkWh": 12345.67 } ] } 3. Normalize Records Type: Code Node Extracts the records array from the API response and maps each entry into separate JSON items for easier handling downstream. Code used: const itemlist = $input.first().json.records; return itemlist.map(r => ({ json: r })); 4. Convert to File Type: Convert to File Node Converts the array of JSON records into a CSV file. The CSV is stored in a binary field called data. 5. Send Email Weekly Report Type: Email Send Node Sends the generated CSV file as an attachment. Parameters: fromEmail: Sender email address (configure in node). toEmail: Recipient email address. subject: "Weekly Energy Consumption Report". attachments: =data (binary data from the previous node). 6. Report File Upload to Google Drive Type: Google Drive Node Uploads the CSV file to your Google Drive root folder. Filename pattern: energy_report_{{ $now.format('yyyy_MM_dd_HH_ii_ss') }} Requires valid Google Drive OAuth2 credentials. ✨ How To Customize Change report frequency, email template, data format (CSV/Excel) or add-ons. ➕ Add-ons Integration with analytics tools (Power BI, Tableau) Additional reporting formats (Excel, PDF) Slack notifications 🚦 Use Case Examples Automated weekly/monthly reporting for compliance Historical consumption tracking Operational analytics and forecasting 🔍 Troubleshooting Guide | Issue | Cause | Solution | |-------|-------|----------| | Data not fetched | API endpoint incorrect | Verify URL | | Email delivery issues | SMTP configuration incorrect | Verify SMTP | | Drive save fails | Permissions/Drive ID incorrect | Check Drive permissions | 📞 Need Assistance? Contact WeblineIndia for additional customization and support, we're happy to help.