AI Website Scraper & Company Intelligence
AI Website Scraper & Company Intelligence
Description
This workflow automates the process of transforming any website URL into a structured, intelligent company profile.
It's triggered by a form, allowing a user to submit a website and choose between a "basic" or "deep" scrape.
The workflow extracts key information (mission, services, contacts, SEO keywords), stores it in a structured Supabase database, and archives a full JSON backup to Google Drive.
It also features a secondary AI agent that automatically finds and saves competitors for each company, building a rich, interconnected database of company intelligence.
Quick Implementation Steps
Import the Workflow: Import the provided JSON file into your n8n instance.
Install Custom Community Node:
You must install the community node from:
👉 https://www.npmjs.com/package/n8n-nodes-crawl-and-scrape
FIRECRAWL N8N Documentation
👉 https://docs.firecrawl.dev/developer-guides/workflow-automation/n8n
Install Additional Nodes:
n8n-nodes-crawl-and-scrape and n8n-nodes-mcp fire crawl mcp .
Set up Credentials:
Create credentials in n8n for FIRE CRAWL API,Supabase, Mistral AI, and Google Drive.
Configure API Key (CRITICAL):
Open the Web Search tool node.
Go to Parameters → Headers and replace the hardcoded Tavily AI API key with your own.
Configure Supabase Nodes:
Assign your Supabase credential to all Supabase nodes.
Ensure table names (e.g., companies, competitors) match your schema.
Configure Google Drive Nodes:
Assign your Google Drive credential to the Google Drive2 and save to Google Drive1 nodes.
Select the correct Folder ID.
Activate Workflow:
Turn on the workflow and open the Webhook URL in the “On form submission” node to access the form.
What It Does
Form Trigger Captures user input: “Website URL” and “Scraping Type” (basic or deep).
Scraping Router
A Switch node routes the flow:
Deep Scraping →** AI-based MCP Firecrawler agent.
Basic Scraping →** Crawlee node.
Deep Scraping (Firecrawl AI Agent) Uses Firecrawl and Tavily Web Search. Extracts a detailed JSON profile: mission, services, contacts, SEO keywords, etc.
Basic Scraping (Crawlee)
Uses Crawl and Scrape node to collect raw text.
A Mistral-based AI extractor structures the data into JSON.
Data Storage Stores structured data in Supabase tables (companies, company_basicprofiles). Archives a full JSON backup to Google Drive.
Automated Competitor Analysis Runs after a deep scrape. Uses Tavily web search to find competitors (e.g., from Crunchbase). Saves competitor data to Supabase, linked by company_id.
Who's It For
Sales & Marketing Teams:** Enrich leads with deep company info.
Market Researchers:** Build structured, searchable company databases.
B2B Data Providers:** Automate company intelligence collection.
Developers:** Use as a base for RAG or enrichment pipelines.
Requirements
n8n instance** (self-hosted or cloud)
Supabase Account:** With tables like companies, competitors, social_links, etc.
Mistral AI API Key**
Google Drive Credentials**
Tavily AI API Key**
(Optional) Custom Nodes:
n8n-nodes-crawl-and-scrape
How It Works
Flow Summary
Form Trigger: Captures “Website URL” and “Scraping Type”.
Switch Node:
deep → MCP Firecrawler (AI Agent).
basic → Crawl and Scrape node.
Scraping & Extraction:
Deep path: Firecrawler → JSON structure.
Basic path: Crawlee → Mistral extractor → JSON.
Storage:
Save JSON to Supabase.
Archive in Google Drive.
Competitor Analysis (Deep Only):
Finds competitors via Tavily.
Saves to Supabase competitors table.
End: Finishes with a No Operation node.
How To Set Up
Import workflow JSON. Install community nodes (especially n8n-nodes-crawl-and-scrape from npm). Configure credentials (Supabase, Mistral AI, Google Drive). Add your Tavily API key. Connect Supabase and Drive nodes properly. Fix disconnected “basic” path if needed. Activate workflow. Test via the webhook form URL.
How To Customize
Change LLMs:** Swap Mistral for OpenAI or Claude.
Edit Scraper Prompts:** Modify system prompts in AI agent nodes.
Change Extraction Schema:** Update JSON Schema in extractor nodes.
Fix Relational Tables:** Add Items node before Supabase inserts for arrays (social links, keywords).
Enhance Automation:** Add email/slack notifications, or replace form trigger with a Google Sheets trigger.
Add-ons
Automated Trigger:** Run on new sheet rows.
Notifications:** Email or Slack alerts after completion.
RAG Integration:** Use the Supabase database as a chatbot knowledge source.
Use Case Examples
Sales Lead Enrichment:** Instantly get company + competitor data from a URL.
Market Research:** Collect and compare companies in a niche.
B2B Database Creation:** Build a proprietary company dataset.
WORKFLOW IMAGE
Troubleshooting Guide
| Issue | Possible Cause | Solution | |-------|----------------|-----------| | Form Trigger 404 | Workflow not active | Activate the workflow | | Web Search Tool fails | Missing Tavily API key | Replace the placeholder key | | FIRECRAWLER / find competitor fails | Missing MCP node | Install n8n-nodes-mcp | | Basic scrape does nothing | Switch node path disconnected | Reconnect “basic” output | | Supabase node error | Wrong table/column names | Match schema exactly |
Need Help or More Workflows?
Want to customize this workflow for your business or integrate it with your existing tools?
Our team at Digital Biz Tech can tailor it precisely to your use case from automation logic to AI-powered enhancements.
Contact: shilpa.raju@digitalbiz.tech
For more such offerings, visit us: https://www.digitalbiz.tech
Related Templates
USDT And TRC20 Wallet Tracker API Workflow for n8n
Overview This n8n workflow is specifically designed to monitor USDT TRC20 transactions within a specified wallet. It u...
Automate Daily Keyword Research with Google Sheets, Suggest API & Custom Search
Who's it for This workflow is perfect for SEO specialists, marketers, bloggers, and content creators who want to automa...
Bulk Automated Google Drive Files Sharing and Direct Download Link Generation
This N8N workflow automates the process of sharing files from Google Drive. It includes OAuth2 authentication, batch pro...
🔒 Please log in to import templates to n8n and favorite templates
Workflow Visualization
Loading...
Preparing workflow renderer
Comments (0)
Login to post comments