by scrapeless official
AI-Powered Web Data Pipeline with n8n How It Works This n8n workflow builds an AI-powered web data pipeline that automates the entire process of: Extraction** Structuring** Vectorization** Storage** It integrates multiple advanced tools to transform messy web pages into clean, searchable vector databases. Integrated Tools Scrapeless** Bypasses JavaScript-heavy websites and anti-bot protections to reliably extract HTML content. Claude AI** Uses LLMs to analyze unstructured HTML and generate clean, structured JSON data. Ollama Embeddings** Generates local vector embeddings from structured text using the all-minilm model. Qdrant Vector DB** Stores semantic vector data for fast and meaningful search capabilities. Webhook Notifications** Sends real-time updates when workflows complete or errors occur. From messy webpages to structured vector data — this pipeline is perfect for building intelligent agents, knowledge bases, or research automation tools. Setup Steps 1. Install n8n > Requires Node.js v18 / v20 / v22 npm install -g n8n n8n After installation, access the n8n interface via: URL: http://localhost:5678 2. Set Up Scrapeless Register at: Scrapeless Copy your API token Paste the token into the HTTP Request node labeled "Scrapeless Web Request" 3. Set Up Claude API (Anthropic) Sign up at Anthropic Console Generate your Claude API key Add the API key to the following nodes: Claude Extractor AI Data Checker Claude AI Agent 4. Install and Run Ollama macOS brew install ollama Linux curl -fsSL https://ollama.com/install.sh | sh Windows Download the installer from: https://ollama.com Start Ollama Server ollama serve Pull Embedding Model ollama pull all-minilm 5. Install Qdrant (via Docker) docker pull qdrant/qdrant docker run -d \ --name qdrant-server \ -p 6333:6333 -p 6334:6334 \ -v $(pwd)/qdrant_storage:/qdrant/storage \ qdrant/qdrant Test if Qdrant is running: curl http://localhost:6333/healthz 6. Configure the n8n Workflow Modify the Trigger (Manual or Scheduled) Input your Target URLs and Collection Name in the designated nodes Paste all required API Tokens / Keys into their corresponding nodes Ensure your Qdrant and Ollama services are running Ideal Use Cases Custom AI Chatbots Private Search Engines Research Tools Internal Knowledge Bases Content Monitoring Pipelines
by Ranjan Dailata
Notice Community nodes can only be installed on self-hosted instances of n8n. Who this is for The Automated Resume Job Matching Engine is an intelligent workflow designed for career platforms, HR tech startups, recruiting firms, and AI developers who want to streamline job-resume matching using real-time data from LinkedIn and job boards. This workflow is tailored for: HR Tech Founders** - Building next-gen recruiting products Recruiters & Talent Sourcers** - Seeking automated candidate-job fit evaluation Job Boards & Portals** - Enriching user experience with AI-driven job recommendations Career Coaches & Resume Writers** - Offering personalized job fit analysis AI Developers** - Automating large-scale matching tasks using LinkedIn and job data What problem is this workflow solving? Manually matching a resume to job description is time-consuming, biased, and inefficient. Additionally, accessing live job postings and candidate profiles requires overcoming web scraping limitations. This workflow solves: Automated LinkedIn profile and job post data extraction using Bright Data MCP infrastructure Semantic matching between job requirements and candidate resume using OpenAI 4o mini Pagination handling for high-volume job data End-to-end automation from scraping to delivery via webhook and persisting the job matched response to disk What this workflow does Bright Data MCP for Job Data Extraction Uses Bright Data MCP Clients to extract multiple job listings (supports pagination) Pulls job data from LinkedIn with the pre-defined filtering criteria's OpenAI 4o mini LLM Matching Engine Extracts paginated job data from the Bright Data MCP extracted info via the MCP scrape_as_html tool. Extracts textual job description information via the scraped job information by leveraging the Bright Data MCP scrape_as_html tool. AI Job Matching node handles the job description and the candidate resume compare to generate match scores with insights Data Delivery Sends final match report to a Webhook Notification endpoint Persistence of AI matched job response to disk Pre-conditions Knowledge of Model Context Protocol (MCP) is highly essential. Please read this blog post - model-context-protocol You need to have the Bright Data account and do the necessary setup as mentioned in the Setup section below. You need to have the Google Gemini API Key. Visit Google AI Studio You need to install the Bright Data MCP Server @brightdata/mcp You need to install the n8n-nodes-mcp Setup Please make sure to setup n8n locally with MCP Servers by navigating to n8n-nodes-mcp Please make sure to install the Bright Data MCP Server @brightdata/mcp on your local machine. Sign up at Bright Data. Navigate to Proxies & Scraping and create a new Web Unlocker zone by selecting Web Unlocker API under Scraping Solutions. Create a Web Unlocker proxy zone called mcp_unlocker on Bright Data control panel. In n8n, configure the OpenAi account credentials. In n8n, configure the credentials to connect with MCP Client (STDIO) account with the Bright Data MCP Server as shown below. Make sure to copy the Bright Data API_TOKEN within the Environments textbox above as API_TOKEN=<your-token>. Update the Set input fields for candidate resume, keywords and other filtering criteria's. Update the Webhook HTTP Request node with the Webhook endpoint of your choice. Update the file name and path to persist on disk. How to customize this workflow to your needs Target Different Job Boards Set input fields with the sites like Indeed, ZipRecruiter, or Monster Customize Matching Criteria Adjust the prompt inside the AI Job Match node Include scoring metrics like skills match %, experience relevance, or cultural fit Automate Scheduling Use a Cron Node to periodically check for new jobs matching a profile Set triggers based on webhook or input form submissions Output Customization Add Markdown/PDF formatting for report summaries Extend with Google Sheets export for internal analytics Enhance Data Security Mask personal info before sending to external endpoints
by InfraNodus
This template can be used to find the content gaps in your competitors' discourse: identifying the topics they are not yet connecting and giving you an opportunity to fill in this gap with your content and product ideas. It will also generate research questions that will help bridge the gaps and generate new ideas. The template showcases the use of multiple n8n nodes and processes: enriching Google sheets file with the new data data extraction content enhancement using GraphRAG approach content gap / research question generation This approach can be very useful for research, marketing, and SEO applications as you can quickly get an overview of the main topics that are available online for a certain niche and understand what is missing. What are Content Gaps in Marketing and SEO? In the context of SEO, content gaps are usually understood as the topics that your competitors rank for but you do not. However, it's hard to rank for these topics because there's very high competition. So a much more effective way is to identify the gaps between the topics your competitors are talking about that are not yet bridged in their discourse. If you address these gaps in your content, you will increase the informational gain that your content offers and also offer a novel perspective while touching upon the topics that are relevant in your field. For example, if we analyze the top websites for "body and physical practices, fitness, etc." we will see that most of them are talking about the health and fitness aspects and another big topic is the community aspect. However, there is a gap between the two topics: which means that most of the websites (companies) that talk about this topic don't mention the two in the same context. This might be an opportunity: bridging the gap between health, fitness but also emphasizing the community aspect that comes with a collective practice. How it works This template consists of the two stages: 1) Data enrichment of a Google sheet file with a list of your competitors using InfraNodus' GraphRAG to generate topical summaries and graph summaries for every URL you're analyzing. 2) Insight generation (using InfraNodus to identify the main topical clusters and gaps in those summaries, these insights are then added to the Google sheet file. Additionally, it contains a sub workflow that you can activate and launch to ask Perplexity model to conduct a market research and find the companies that operate in your field and populate the original Google sheet file. Here's a description step by step: Step 0: Populate the Google sheets file with the company data (either manually or using the sub-workflow provided or Manus AI / Deep Research) Steps 1-2: Triggering and Launching the workflow, extracting the company URL from the Google sheet row Step 3: Scraping the url content from the companies' websites and cleaning the data Steps 5-7: Use InfraNodus GraphRAG Content Enhancer to get a topical summary and graph summary. This is what you're going to get: Steps 8-10: Use InfraNodus AI to generate insight advice and research questions based on the content gaps How to use You need an InfraNodus GraphRAG API account and key to use this workflow. Create an InfraNodus account Get the API key at https://infranodus.com/api-access and create a Bearer authorization key for the InfraNodus HTTP nodes. Create a separate knowledge graph for each expert (using PDF / content import options) in InfraNodus For each graph, go to the workflow, paste the name of the graph into the body name field. Keep other settings intact or learn more about them at the InfraNodus access points page. Once you add one or more graphs as experts to your flow, add the LLM key to the OpenAI node and launch the workflow Requirements An InfraNodus account and API key A Google Sheet account and an authorization key Note: OpenAI key is not required. But you might want to get a Perplexity AI key if you'd like to use the sub-workflow that populates the Google sheet with your competitors' website addresses (if you don't have this list yet). Customizing this workflow You can use this same workflow with a Telegram bot or Slack (to be notified of the summaries and ideas). You can also hook up automated social media content creation workflows in the end of this template, so you can generate posts that are relevant (covering the important topics in your niche) but also novel (because they connect them in a new way). Check out our n8n templates for ideas at https://n8n.io/creators/infranodus/ Check out the complete guide at https://support.noduslabs.com/hc/en-us/articles/20234254556828-Find-Content-Gaps-in-Websites-Market-Research-and-SEO-n8n-Workflow Also check the full tutorial with a conceptual explanation at https://support.noduslabs.com/hc/en-us/articles/20454382597916-Beat-Your-Competition-Target-Their-Content-Gaps-with-this-n8n-Automation-Workflow Also check out the video tutorial with a demo: For support and help with this workflow, please, contact us at https://support.noduslabs.com
by InfraNodus
This template can be used to upload the files in your Google drive to an InfraNodus knowledge graph. The InfraNodus graph will then reveal the main topics and ideas in your collection of documents and show the content gaps in them. You can also use the built-in AI to converse with the documents. You can also access the InfraNodus Graphs via its GraphRAG API to re-use them in your other n8n workflows for high-quality content retrieval and knowledge base optimization. The template showcases the use of multiple n8n nodes and processes: Extracting documents from a Google Drive folder text extraction optional: high-quality PDF conversion using ConvertAPI InfraNodus knowledge graph generation Note: If you want to **Sync your Google drive to an InfraNodus graph, check out our other workflow* How it works Here's a description of this workflow step by step: Find all the files in a specific Google drive folder For each file found: reiterate the workflow and Identify the type of the file (TXT, PDF, Markdown) For TXT and Markdown files extract the text data For PDF files use a special PDF to Text convertor to extract the text data. (Optional: using ConvertAPI for better quality PDF conversion) Forward everything to the InfraNodus graphAndStatements API endpoint with the name of the new graph, the text field with the text data, the text settings, and doNotSave=false to create a new graph Reiterate through another file. How to use You need an InfraNodus GraphRAG API account and key to use this workflow. Create an InfraNodus account Get the API key at https://infranodus.com/api-access and create a Bearer authorization key for the InfraNodus HTTP nodes. Use that API key to set up authorization for the InfraNodus tool in the workflow. If you want to upload the files to an existing graph, you should copy its name from InfraNodus. Otherwise you can specify any name you want. Requirements An InfraNodus account and API key A Google Drive account and authorization (you will need to set it up via Google Cloud using the n8n instructions provided in the Google Drive node). Customizing this workflow You can use Dropbox instead of Google Drive. You can also modify this workflow slightly to make it Sync with a Google Drive when the new files appear in it. Check out the complete guide at https://support.noduslabs.com/hc/en-us/articles/20267019838108-Upload-Sync-Your-Google-Drive-Folder-with-InfraNodus-using-n8n
by Ranjan Dailata
Notice Community nodes can only be installed on self-hosted instances of n8n. Who this is for The Brave Search Structured Data Extractor workflow is designed for professionals and teams that need high-quality, structured insights from Brave search results in real time. Whether you're performing market research, tracking competitors, training AI models, or powering content engines, this workflow offers a robust and automated solution. This workflow is tailored for: Market Researchers - Who analyze trends across multimedia channels AI Developers - Who require clean, structured datasets for model fine-tuning SEO & Content - Analysts looking to monitor visibility across news, images, and videos Media Researchers - Curating timely and relevant information across formats Automation Engineers - Integrating search insights into downstream workflows What problem is this workflow solving? Traditional web scraping and search result parsing is fragmented, inconsistent, and prone to errors, especially when dealing with multimedia (images, videos, news) data from search engines. This workflow provides: Centralized Brave search data extraction across all content types. Switches the search execution based upon the type of search that is being set. ex: news, images, videos, all Automated structured data transformation using Google Gemini Unified output persistence and notification across disk, webhook, and Google Sheets What this workflow does Input Configuration Define your Brave search query Set the search type: videos, images, news, or all Configure your Bright Data MCP zone Bright Data MCP Search Execution Initiates a Brave search via Bright Data MCP using the correct URL pattern for each search type Returns raw HTML of search results Google Gemini LLM Structured Data Extraction Transforms raw results into structured data (e.g., title, URL, source, snippet) Output Handling Save to disk (e.g., JSON or CSV file) Send Webhook notification with structured data (e.g., Slack, internal dashboards) Store in Google Sheets for team-wide access or dashboarding Pre-conditions Knowledge of Model Context Protocol (MCP) is highly essential. Please read this blog post - model-context-protocol You need to have the Bright Data account and do the necessary setup as mentioned in the Setup section below. You need to have the Google Gemini API Key. Visit Google AI Studio You need to install the Bright Data MCP Server @brightdata/mcp You need to install the n8n-nodes-mcp Setup Please make sure to setup n8n locally with MCP Servers by navigating to n8n-nodes-mcp Please make sure to install the Bright Data MCP Server @brightdata/mcp on your local machine. Sign up at Bright Data. Create a Web Unlocker proxy zone called mcp_unlocker on Bright Data control panel. Navigate to Proxies & Scraping and create a new Web Unlocker zone by selecting Web Unlocker API under Scraping Solutions. In n8n, configure the Google Gemini(PaLM) Api account with the Google Gemini API key (or access through Vertex AI or proxy). In n8n, configure the credentials to connect with MCP Client (STDIO) account with the Bright Data MCP Server as shown below. Make sure to copy the Bright Data API_TOKEN within the Environments textbox above as API_TOKEN=<your-token> How to customize this workflow to your needs Enhance Output Analysis Add additional LLM prompts for topic classification, sentiment scoring, or trend forecasting. Output Format Options Choose to output CSV, Markdown, or HTML reports based on your integration target. Schedule Automation Trigger the workflow on a schedule (daily/weekly) to keep monitoring topical content.
by InfraNodus
This template can be used to sync the files in your Google drive to a new or existing InfraNodus knowledge graph. The InfraNodus graph will then reveal the main topics and ideas in your collection of documents and show the content gaps in them. You can also use the built-in AI to converse with the documents. You can also access the InfraNodus Graphs via its GraphRAG API to re-use them in your other n8n workflows for high-quality content retrieval and knowledge base optimization. The template showcases the use of multiple n8n nodes and processes: Syncing documents from a Google Drive folder / extracting them text extraction from files optional: high-quality PDF conversion using ConvertAPI InfraNodus knowledge graph generation Note: If you want to **upload files from your Google drive to an InfraNodus graph, check out our other workflow* How it works Here's a description of this workflow step by step: Wait for new file(s) to appear in the Google drive folder Reiterate through each file Retrieve the new file from the Google drive For each file found: reiterate the workflow and Identify the type of the file (TXT, PDF, Markdown) For TXT and Markdown files extract the text data For PDF files use a special PDF to Text convertor to extract the text data. (Optional: using ConvertAPI for better quality PDF conversion) Forward everything to the InfraNodus graphAndStatements API endpoint with the name of the new graph, the text field with the text data, the text settings, and doNotSave=false to create a new graph Reiterate through another file. How to use You need an InfraNodus GraphRAG API account and key to use this workflow. Create an InfraNodus account Get the API key at https://infranodus.com/api-access and create a Bearer authorization key for the InfraNodus HTTP nodes. Use that API key to set up authorization for the InfraNodus tool in the workflow. If you want to upload the files to an existing graph, you should copy its name from InfraNodus. Otherwise you can specify any name you want. Requirements An InfraNodus account and API key A Google Drive account and authorization (you will need to set it up via Google Cloud using the n8n instructions provided in the Google Drive node). Customizing this workflow You can use Dropbox instead of Google Drive. You can also modify this workflow slightly to make it Upload the files from a Google Drive when the new files appear in it. Check out the complete guide at https://support.noduslabs.com/hc/en-us/articles/20267019838108-Upload-Sync-Your-Google-Drive-Folder-with-InfraNodus-using-n8n
by Samir Saci
Tags: Scrapping, Events, European Union, Networking Context Hey! I’m Samir, a Supply Chain Engineer and Data Scientist from Paris, and the founder of LogiGreen Consulting. We use AI, automation, and data to support sustainable and data-driven operations across all types of organizations. This workflow is part of our networking strategy (as a business) to track official EU events that may relate to topics we cover. > Want to stay ahead of critical EU meetings and events without checking the website every day? This n8n workflow automatically scrapes the EU’s official event portal and logs the latest entries with clean metadata including date, location, category, and link. 📬 For collaborations, feel free to connect with me on LinkedIn Who is this template for? This workflow is useful for: Policy & public affairs teams** following institutional activities Sustainability teams** watching for relevant climate-related summits NGOs and researchers** interested in event calendars Data teams** building dashboards on public event trends What does it do? This n8n workflow: 🌐 Scrapes the EU events portal for new meetings and conferences 📅 Extracts event metadata (title, date, location, type, and link) 🔁 Handles pagination across multiple pages 🚫 Checks for duplicates already stored 📊 Saves new records into a connected Google Sheet How it works Triggered daily via cron HTTP node loads the event listing HTML Extract HTML blocks for each event article Parse event name, link, type, location, and full date Concatenate and clean dates for easy tracking Store non-duplicate entries in Google Sheets The workflow uses static data to track pagination and ensure only new events are stored, making it ideal for building up a clean dataset over time. What do I need to get started? You’ll need: A Google Sheet connected to your n8n instance No code or AI tools needed — just n8n and this template Follow the Guide! Sticky notes are included directly inside the workflow to guide you step-by-step through setup and customisation. 🎥 Watch My Tutorial Notes This is ideal for analysts and consultants who want clean, structured data from the EU portal You can add filtering, email alerts, or AI classifiers later This workflow was built using n8n version 1.93.0 Submitted: June 1, 2025
by Naveen Choudhary
Who is this for? Marketing agencies, sales teams, lead generation specialists, and business development professionals who need to build comprehensive business databases with contact information for outreach campaigns across any industry. What problem is this workflow solving? Finding businesses and their contact details manually is time-consuming and inefficient. This workflow automates the entire process of discovering businesses through Google Maps and extracting their digital contact information from websites, saving hours of manual research. What this workflow does This automated workflow runs every 30 minutes to: Scrape business data from Google Maps using Apify's Google Places crawler Save basic business information (name, address, phone, website) to Google Sheets Filter businesses that have websites Scrape each business's website content using Firecrawl Extract contact information including emails, LinkedIn, Facebook, Instagram, and Twitter profiles Store all extracted data in organized Google Sheets for easy access and follow-up Setup Required Services: Google Sheets account with OAuth2 setup Apify account with API access for Google Places scraping Firecrawl account with API access for website scraping Pre-setup: Copy this Google Sheet Configure your Apify and Firecrawl API credentials in n8n Set up Google Sheets OAuth2 connection Update the Google Sheet ID in all Google Sheets nodes Quick Start: The workflow includes detailed sticky notes explaining each phase. Simply configure your API credentials and Google Sheet, then activate the workflow. How to customize this workflow to your needs Change search criteria**: Modify the Apify scraping parameters to target different business types (restaurants, gyms, salons, etc.) or locations Adjust schedule**: Change the trigger interval from 30 minutes to your preferred frequency Add more contact fields**: Extend the extraction code to find additional contact information like WhatsApp or Telegram Filter criteria**: Modify the filter conditions to target businesses with specific characteristics Batch size**: Adjust the batch processing to handle more or fewer websites simultaneously Perfect for lead generation, competitor research, and building targeted marketing lists across any industry or business type.
by Ranjan Dailata
Who this is for? The LinkedIn Profile Extract and JSON Resume Builder is a powerful workflow that scrapes professional profile data from LinkedIn using Bright Data's infrastructure, then transforms that data into a clean, structured JSON resume using Google Gemini. The workflow is ideal for automating resume parsing, candidate profiling, or integrating into recruiting platforms. This workflow is tailored for: HR professionals & recruiters automating resume screening Talent acquisition platforms enriching candidate profiles Developers & AI builders creating resume-parsing AI pipelines Data scientists working on labor market analytics Growth hackers profiling prospects via public data What problem is this workflow solving? Parsing resumes or LinkedIn profiles into machine-readable formats is often a manual, error-prone process. Most scraping tools either fail due to anti-bot protections or return unstructured HTML that's hard to work with. This workflow solves that by: Using Bright Data's Web Unlocker for reliable, CAPTCHA-free LinkedIn scraping Extracting clean text and structured profile data via Google Gemini LLM Automatically generating a standards-compliant JSON Resume and Skills Sending the resume to webhooks or storing it for downstream usage What this workflow does Accepts LinkedIn Profile URL and required metadata (Bright Data zone, webhook) Scrapes LinkedIn profile using Bright Data Web Unlocker Extracts clean content and skills using Google Gemini LLM Builds a JSON-formatted resume following the JSON resume schema Sends the JSON resume via Webhook Notification Persists the output by saving the file to disk Setup Sign up at Bright Data. Navigate to Proxies & Scraping and create a new Web Unlocker zone by selecting Web Unlocker API under Scraping Solutions. In n8n, configure the Header Auth account under Credentials (Generic Auth Type: Header Authentication). The Value field should be set with the Bearer XXXXXXXXXXXXXX. The XXXXXXXXXXXXXX should be replaced by the Web Unlocker Token. In n8n, configure the Google Gemini(PaLM) Api account with the Google Gemini API key (or access through Vertex AI or proxy). Update the Set URL and Bright Data Zone node with the LinkedIn profile, Bright Data Zone and the Webhook notification URL. For testing purposes, you can obtain a webhook url using https://webhook.site/ How to customize this workflow to your needs Add Language Translation Insert a translation LLM node to support multilingual profiles. Generate PDF Resumes Convert JSON to formatted PDF resumes using an HTML-to-PDF module. Push to ATS or CRM Add integration nodes to pipe data into applicant tracking systems (ATS), CRMs, or databases. Use Alternative LLMs Swap Gemini with OpenAI or Anthropic Claude if preferred.
by Roman Rozenberger
How it works • Extract AI Overviews from Google Search - Receives data from browser extension via webhook • Convert HTML to Markdown - Automatically processes and cleans AI Overview content • Store in Google Sheets - Archives all extracted AI Overviews with metadata and sources • Generate SEO Guidelines - AI analyzes page content vs AI Overview to suggest improvements • Automate Analysis - Batch process multiple URLs and schedule regular checks Set up steps • Import workflow - Load the JSON template into your n8n instance (2 minutes) • Configure Google Sheets - Set up OAuth connection and create spreadsheet with required columns (5 minutes) • Set up AI provider - Add OpenRouter API credentials for Gemini 2.5 Pro (3 minutes) • Install browser extension - Deploy the companion Chrome/Firefox extension for data extraction (5 minutes) • Test webhook endpoint - Verify the connection between extension and n8n workflow (2 minutes) Total setup time: ~15 minutes What you'll need: Google account for Sheets integration Google Sheet template with required columns OpenRouter API key for Gemini 2.5 Pro model access Browser extension: Chrome Extension or Firefox Add-on n8n instance (local or cloud) Use cases: SEO agencies** - Monitor AI Overview presence for client keywords Content marketers** - Analyze what content gets featured in AI Overviews E-commerce** - Track AI Overview coverage for product-related searches Research** - Build datasets of AI Overview content across different topics The workflow comes with a free browser extension (Chrome | Firefox) that automatically extracts AI Overview content from Google Search and sends it via webhook to your n8n workflow for processing and analysis. GitHub Repository: https://github.com/romek-rozen/ai-overview-extractor/ Detailed Setup Instructions - AI Overview Extractor Prerequisites n8n instance** (local or cloud) - version 1.95.3+ Google account** for Sheets integration OpenRouter API account** for Gemini 2.5 Pro access Browser** (Chrome/Firefox) for the extension Step 1: Import the Workflow Open n8n and navigate to Workflows Click "Add workflow" → "Import from JSON" Upload the AI_OVERVIES_EXTRACTOR_TEMPLATE.json file Save the workflow Step 2: Configure Google Sheets Create Google Sheets Document Create new Google Sheet with these columns: extractedAt | searchQuery | sources | markdown | myURL | task | guidelines | key Here is public google sheet template: https://docs.google.com/spreadsheets/d/15xqZ2dTiLMoyICYnnnRV-HPvXfdgVeXowr8a7kU4uHk/edit?gid=0#gid=0 Copy the Google Sheets URL (you'll need it for the workflow) Set up Google Sheets Credentials In n8n, go to Settings → Credentials Click "Add credential" → "Google Sheets OAuth2 API" Follow the OAuth setup to authorize n8n access to Google Sheets Name the credential (e.g., "Google Sheets AI Overview") Configure Google Sheets Nodes Update these nodes with your Google Sheets URL: Get URLs to Analyze Save AI Overview to Sheets Save SEO Guidelines to Sheets In each node: Set documentId to your Google Sheets URL Set sheetName to your Google Sheets URL Select your Google Sheets credential Step 3: Configure AI Provider (OpenRouter) Get OpenRouter API Key Sign up at https://openrouter.ai/ Generate API key in your account settings Add credits to your account Set up OpenRouter Credentials In n8n, go to Settings → Credentials Click "Add credential" → "OpenRouter API" Enter your API key Name the credential (e.g., "OpenRouter AI Overview") Configure OpenRouter Node Select the Gemini 2.5 Pro Model node Choose your credential from the dropdown Verify the model (default: google/gemini-2.5-pro-preview) Step 4: Install Browser Extension Install in Chrome Official Extension (Recommended) Visit: https://chromewebstore.google.com/detail/ai-overview-extractor/cbkdfibgmhicgnmmdanlhnebbgonhjje Click "Add to Chrome" Install in Firefox Official Add-on Visit: https://addons.mozilla.org/en-US/firefox/addon/ai-overview-extractor/ Click "Add to Firefox" Step 5: Configure Webhook Connection Get Webhook URL In n8n workflow, click on the Webhook node Copy the webhook URL (should be like: http://localhost:5678/webhook/ai-overview-extractor-template-123456789) Configure Extension Go to Google Search and perform any search with AI Overview Click the browser extension button (AI Overview Extractor) In webhook configuration section, paste your webhook URL Click "Test" - should show ✅ Test successful Click "Save" to store the configuration Step 6: Activate and Test Activate Workflow In n8n, toggle the workflow to "Active" (top right switch) Verify all nodes are properly configured Test End-to-End Go to Google Search Search for something that shows AI Overview Use the extension to extract AI Overview Send via webhook - check your Google Sheets for the data Verify the markdown conversion worked correctly Optional: Batch Analysis Setup For SEO Analysis Features In your Google Sheets, add URLs in the myURL column Set task column to "create guidelines" Run the workflow manually or wait for the 15-minute scheduler Check guidelines column for AI-generated SEO recommendations Troubleshooting Webhook Issues Ensure n8n is running on port 5678 Check if workflow is activated Verify webhook URL format Google Sheets Errors Confirm OAuth credentials are working Check sheet URL format Verify column names match exactly Ensure nodes Get URLs to Analyze, Save AI Overview to Sheets, and Save SEO Guidelines to Sheets are properly configured OpenRouter Issues Check API key validity Ensure sufficient account credits Try different models if Gemini 2.5 Pro fails Verify the Gemini 2.5 Pro Model node is properly connected Extension Problems Check browser console for errors Verify extension is properly installed Ensure you're on google.com/search pages Confirm webhook URL is correctly configured in extension Next Steps Customize AI prompts** in the Generate SEO Recommendations node for your specific needs Adjust scheduler frequency** (default: 15 minutes) Add more URL analysis** by populating Google Sheets Monitor usage** and API costs Support GitHub Issues**: https://github.com/romek-rozen/ai-overview-extractor/issues n8n Community**: https://community.n8n.io/ Template Documentation**: Check the included README files
by Ranjan Dailata
Notice Community nodes can only be installed on self-hosted instances of n8n. Who this is for This workflow automates the real-time extraction of Job Descriptions and Salary Information from job listing pages using Bright Data MCP and analyzes content using OpenAI GPT-4o mini. This workflow is ideal for: Recruiters & HR Tech Startups**: Automate job data collection from public listings Market Intelligence Teams**: Analyze compensation trends across companies or geographies Job Boards & Aggregators**: Power search results with structured, enriched listings AI Workflow Builders**: Extend to other career platforms or automate resume-job match analysis Analysts & Researchers**: Track hiring signals and salary benchmarks in real time What problem is this workflow solving? Traditional scraping of job portals can be challenging due to cluttered content, anti-scraping measures, and inconsistent formatting. Manually analyzing salary ranges and job descriptions is tedious and error-prone. This workflow solves the problem by: Simulating user behavior using Bright Data MCP Client to bypass anti-scraping systems Extracting structured, clean job data in Markdown format Using OpenAI GPT-4o mini to analyze and extract precise salary details and refined job descriptions Merging and formatting the result for easy consumption Delivering final output via webhook, Google Sheets, or file system What this workflow does Components & Flow Input Nodes job_search_url: The job listing or search result URL job_role: The title or role being searched for (used in logging/formatting) MCP Client Operations MCP Salary Data Extractor Simulates browser behavior and scrapes salary-related content (if available) MCP Job Description Extractor Extracts full job description as structured Markdown content OpenAI GPT-4o mini Nodes Salary Information Extractor Uses GPT-4o mini to detect, clean, and standardize salary range data (if any) Job Description Refiner Extracts role responsibilities, qualifications, and benefits from unstructured text Company Information Extractor Uses Bright Data MCP and GPT-4o mini to extract the company information Merge Node Combines the refined job description and extracted salary information into a unified JSON response object Aggregate node Aggregates the job description and salary information into a single JSON response object Final Output Handling The output is handled in three different formats depending on your downstream needs: Save to Disk** Output stored with filename including timestamp and job role Google Sheet Update** Adds a new row with job role, salary, summary, and link Webhook Notification** Pushes merged response to an external system Pre-conditions Knowledge of Model Context Protocol (MCP) is highly essential. Please read this blog post - model-context-protocol You need to have the Bright Data account and do the necessary setup as mentioned in the Setup section below. You need to have the Google Gemini API Key. Visit Google AI Studio You need to install the Bright Data MCP Server @brightdata/mcp You need to install the n8n-nodes-mcp Setup Please make sure to setup n8n locally with MCP Servers by navigating to n8n-nodes-mcp Please make sure to install the Bright Data MCP Server @brightdata/mcp on your local machine. Sign up at Bright Data. Navigate to Proxies & Scraping and create a new Web Unlocker zone by selecting Web Unlocker API under Scraping Solutions. Create a Web Unlocker proxy zone called mcp_unlocker on Bright Data control panel. In n8n, configure the OpenAi account credentials. In n8n, configure the credentials to connect with MCP Client (STDIO) account with the Bright Data MCP Server as shown below. Make sure to copy the Bright Data API_TOKEN within the Environments textbox above as API_TOKEN=<your-token> How to customize this workflow to your needs Modify Input Source Change the job_search_url to point to any job board or aggregator Customize job_role to reflect the type of jobs being analyzed Tweak LLM Prompts (Optional) Refine GPT-4o mini prompts to extract additional fields like benefits, tech stacks, remote eligibility Change Output Format Customize the merged object to output JSON, CSV, or Markdown based on downstream needs Add additional destinations (e.g., Slack, Airtable, Notion) via n8n nodes
by Ranjan Dailata
Notice Community nodes can only be installed on self-hosted instances of n8n. Who this is for The DNB Company Search & Extract workflow is designed for professionals who need to gather structured business intelligence from Dun & Bradstreet (DNB). It is ideal for: Market Researchers B2B Sales & Lead Generation Experts Business Analysts Investment Analysts AI Developers Building Financial Knowledge Graphs What problem is this workflow solving? Gathering business information from the DNB website usually involves manual browsing, copying company details, and organizing them in spreadsheets. This workflow automates the entire data collection pipeline — from searching DNB via Google, scraping relevant pages, to structuring the data and saving it in usable formats. What this workflow does This workflow performs automated search, scraping, and structured extraction of DNB company profiles using Bright Data’s MCP search agents and OpenAI’s 4o mini model. Here's what it includes: Set Input Fields: Provide search_query and webhook_notification_url. Bright Data MCP Client (Search): Performs Google search for the DNB company URL. Markdown Scrape from DNB: Scrapes the company page using Bright Data and returns it as markdown. OpenAI LLM Extraction: Transforms markdown into clean structured data. Extracts business information (company name, size, address, industry, etc.) Webhook Notification: Sends structured response to your provided webhook. Save to Disk: Persists the structured data locally for logging or auditing. Pre-conditions Knowledge of Model Context Protocol (MCP) is highly essential. Please read this blog post - model-context-protocol You need to have the Bright Data account and do the necessary setup as mentioned in the Setup section below. You need to have the Google Gemini API Key. Visit Google AI Studio You need to install the Bright Data MCP Server @brightdata/mcp You need to install the n8n-nodes-mcp Setup Please make sure to setup n8n locally with MCP Servers by navigating to n8n-nodes-mcp Please make sure to install the Bright Data MCP Server @brightdata/mcp on your local machine. Sign up at Bright Data. Navigate to Proxies & Scraping and create a new Web Unlocker zone by selecting Web Unlocker API under Scraping Solutions. Create a Web Unlocker proxy zone called mcp_unlocker on Bright Data control panel. In n8n, configure the OpenAi account credentials. In n8n, configure the credentials to connect with MCP Client (STDIO) account with the Bright Data MCP Server as shown below. Make sure to copy the Bright Data API_TOKEN within the Environments textbox above as API_TOKEN=<your-token>. Update the Set input fields for search_query and webhook_notification_url. Update the file name and path to persist on disk. How to customize this workflow to your needs Search Engine**: Default is Google, but you can change the MCP client engine to Bing, or Yandex if needed. Company Scope**: Modify search query logic for niche filtering, e.g., "biotech startups site:dnb.com". Structured Fields**: Customize the LLM prompt to extract additional fields like CEO name, revenue, or ratings. Integrations**: Push output to Notion, Airtable, or CRMs like HubSpot using additional n8n nodes. Formatting**: Convert output to PDF or CSV using built-in File and Spreadsheet nodes.