by Incrementors
Yelp Business Scraper by URL via Bright Data API with Google Sheets Storage Overview This n8n workflow automates the process of scraping comprehensive business information from Yelp using individual business URLs. It integrates with Bright Data for professional web scraping and Google Sheets for centralized data storage, providing detailed business intelligence for market research, competitor analysis, and lead generation. Workflow Components 1. 📥 Form Trigger Type**: Form Trigger Purpose**: Initiates the workflow with user-submitted Yelp business URL Input Fields**: URL (Yelp business page URL) Function**: Captures target business URL to start the scraping process 2. 🔍 Trigger Bright Data Scrape Type**: HTTP Request (POST) Purpose**: Sends scraping request to Bright Data API for Yelp business data Endpoint**: https://api.brightdata.com/datasets/v3/trigger Parameters**: Dataset ID: gd_lgugwl0519h1p14rwk Include errors: true Limit multiple results: 5 Limit per input: 20 Function**: Initiates comprehensive business data extraction from Yelp 3. 📡 Monitor Snapshot Status Type**: HTTP Request (GET) Purpose**: Monitors the progress of the Yelp scraping job Endpoint**: https://api.brightdata.com/datasets/v3/progress/{snapshot_id} Function**: Checks if the business data scraping is complete 4. ⏳ Wait 30 Sec for Snapshot Type**: Wait Node Purpose**: Implements intelligent polling mechanism Duration**: 30 seconds Function**: Pauses workflow before rechecking scraping status to optimize API usage 5. 🔁 Retry Until Ready Type**: IF Condition Purpose**: Evaluates scraping completion status Condition**: status === "ready" Logic**: True: Proceeds to data retrieval False: Loops back to status monitoring with wait 6. 📥 Fetch Scraped Business Data Type**: HTTP Request (GET) Purpose**: Retrieves the final scraped business information Endpoint**: https://api.brightdata.com/datasets/v3/snapshot/{snapshot_id} Format**: JSON Function**: Downloads completed Yelp business data with comprehensive details 7. 📊 Store to Google Sheet Type**: Google Sheets Node Purpose**: Stores scraped business data for analysis and storage Operation**: Append rows Target**: "Yelp scraper data by URL" sheet Data Mapping**: Business Name, Overall Rating, Reviews Count Business URL, Images/Videos URLs Additional business metadata fields Workflow Flow Form Input → Trigger Scrape → Monitor Status → Wait 30s → Check Ready ↑ ↓ └─── Loop ─────┘ ↓ Fetch Data → Store to Sheet Configuration Requirements API Keys & Credentials Bright Data API Key**: Required for Yelp business scraping Google Sheets OAuth2**: For data storage and export access n8n Form Webhook**: For user input collection Setup Parameters Google Sheet ID**: Target spreadsheet identifier Dataset ID**: gd_lgugwl0519h1p14rwk (Yelp business scraper) Form Webhook ID**: User input form identifier Google Sheets Credential ID**: OAuth2 authentication Key Features Comprehensive Business Data Extraction Complete business profile information Customer ratings and review counts Contact details and business hours Photo and video content URLs Location and category information Intelligent Status Monitoring Real-time scraping progress tracking Automatic retry mechanisms with 30-second intervals Status validation before data retrieval Error handling and timeout management Centralized Data Storage Automatic Google Sheets export Organized business data format Historical scraping records Easy sharing and collaboration URL-Based Processing Direct Yelp business URL input Single business deep-dive analysis Flexible input through web form Real-time workflow triggering Use Cases Market Research Competitor business analysis Local market intelligence gathering Industry benchmark establishment Service offering comparison Lead Generation Business contact information extraction Potential client identification Market opportunity assessment Sales prospect development Business Intelligence Customer sentiment analysis through ratings Competitor performance monitoring Market positioning research Brand reputation tracking Location Analysis Geographic business distribution Local competition assessment Market saturation evaluation Expansion opportunity identification Data Output Fields | Field | Description | Example | |-------|-------------|---------| | Name | Business name | "Joe's Pizza Restaurant" | | Overall Rating | Average customer rating | "4.5" | | Reviews Count | Total number of reviews | "247" | | URL | Original Yelp business URL | "https://www.yelp.com/biz/joes-pizza..." | | Images/Videos URLs | Media content links | "https://s3-media1.fl.yelpcdn.com/..." | Technical Notes Polling Interval**: 30-second status checks for optimal API usage Result Limiting**: Maximum 20 businesses per input, 5 multiple results Data Format**: JSON with structured field mapping Error Handling**: Comprehensive error tracking in all API requests Retry Logic**: Automatic status rechecking until completion Form Input**: Single URL field with validation Storage Format**: Structured Google Sheets with predefined columns Setup Instructions Step 1: Import Workflow Copy the JSON workflow configuration Import into n8n: Workflows → Import from JSON Paste configuration and save Step 2: Configure Bright Data Set up credentials: Navigate to Credentials → Add Bright Data API Enter your Bright Data API key Test connection Update API key references: Replace BRIGHT_DATA_API_KEY in all HTTP request nodes Verify dataset access for gd_lgugwl0519h1p14rwk Step 3: Configure Google Sheets Create target spreadsheet: Create new Google Sheet named "Yelp Business Data" or similar Copy the Sheet ID from URL Set up OAuth2 credentials: Add Google Sheets OAuth2 credential in n8n Complete authentication process Update workflow references: Replace YOUR_GOOGLE_SHEET_ID with actual Sheet ID Update YOUR_GOOGLE_SHEETS_CREDENTIAL_ID with credential reference Step 4: Test and Activate Test with sample URL: Use a known Yelp business URL Monitor execution progress Verify data appears in Google Sheet Activate workflow: Toggle workflow to "Active" Share form URL with users Sample Business Data The workflow captures comprehensive business information including: Basic Information**: Name, category, location Performance Metrics**: Ratings, review counts, popularity Contact Details**: Phone, website, address Visual Content**: Photos, videos, gallery URLs Operational Data**: Hours, services, amenities Customer Feedback**: Review summaries, sentiment indicators Advanced Configuration Batch Processing Modify the input to accept multiple URLs: [ {"url": "https://www.yelp.com/biz/business-1"}, {"url": "https://www.yelp.com/biz/business-2"}, {"url": "https://www.yelp.com/biz/business-3"} ] Enhanced Data Fields Add more extraction fields by updating the dataset configuration: Business hours and schedule Menu items and pricing Customer photos and reviews Special offers and promotions Notification Integration Add alert mechanisms: Email notifications for completed scrapes Slack messages for team updates Webhook triggers for external systems Error Handling Common Issues Invalid URL**: Ensure URL is a valid Yelp business page Rate Limiting**: Bright Data API usage limits exceeded Authentication**: Google Sheets or Bright Data credential issues Data Format**: Unexpected response structure from Yelp Troubleshooting Steps Verify URLs: Ensure Yelp business URLs are correctly formatted Check Credentials: Validate all API keys and OAuth tokens Monitor Logs: Review n8n execution logs for detailed errors Test Connectivity: Verify network access to all external services Performance Specifications Processing Time**: 2-5 minutes per business URL Data Accuracy**: 95%+ for publicly available business information Success Rate**: 90%+ for valid Yelp business URLs Concurrent Processing**: Depends on Bright Data plan limits Storage Capacity**: Unlimited (Google Sheets based) **For any questions or support, please contact: info@incrementors.com or fill out this form: https://www.incrementors.com/contact-us/
by Julian Kaiser
🗂️ Bulk File Upload to Google Drive with Folder Management How it works User submits files and target folder name via form Workflow checks if folder exists in Drive Creates folder if needed or uses existing one Processes and uploads all files maintaining structure Set up steps (Est. 10-15 mins) Set up Google Drive credentials in n8n Replace parent folder ID in search query with your Drive folder ID Configure form node with: Multiple file upload field Folder name text field Test workflow with sample files 💡 Detailed configuration steps and patterns are documented in sticky notes within the workflow. Perfect for: Bulk file organization Automated Drive folder management File upload automation Maintaining consistent file structures
by Jordan Lee
This flexible template scrapes business listings for any industry and location, perfect for sales teams, marketers, and researchers. Good to know Works with any business category (restaurants, contractors, retailers, etc.) Fully customizable search parameters Results automatically organized in Google Sheets Built-in delay ensures scraping completes before data collection How it works Trigger: Manual or scheduled start Apify Configuration: Sets scraping parameters (industry, location, data fields) Scraping Execution: Runs the web scraping job Data Processing: Cleans and structures the raw data Storage: Saves results to your Google Sheets What is Apify? Apify is a webscraping tool, in this workflow the data is scraped from a google maps scraper: https://apify.com/compass/crawler-google-places How to use Apify Small # Lead Generation (Purple) https://apify.com/compass/crawler-google-places Add location and industry to scrape (Apify) Add the number of leads to output (Apify) Copy over the JSON file into N8N Copy & paste API endpoint "Get Run URL" in N8N Apify Large # Lead Generation (Grey) Configure the Manual Trigger When clicking 'Execute workflow' node is ready to use as-is This triggers the entire lead generation process Setup "Start Results (Apify)" Node Get Your Apify API Information Go to Apify.com and create a free account Navigate to Settings → Integrations → API tokens Copy your API token Find the Google Maps scraper actor ID: Configure the HTTP Request (start results) Method: POST URL: Replace "enter apify (get run)" with: https://api.apify.com/v2/acts/nwua9Gu5YrADL7ZDj/runs?token=YOUR_API_TOKEN C. Customize the JSON Body Parameters In the JSON body, modify these key fields: Location & Search: "locationQuery": Change "Toronto" to your target city "searchStringsArray": Change ["barber"] to your business type Examples: ["restaurants"], ["dentists"], ["contractors"] Configure the HTTP Request (start results) Method : Get Url: enter the get dataset URL from Apify Split Out Node Select fields to append in the google sheet Test the Configuration Click Execute workflow to test Check that the Apify job starts successfully Note the job ID returned for the next section This section initiates the scraping process and should complete in 30-60 seconds depending on your lead count. Setup Google Sheets Create a new Google Sheet with these columns: title (business name) address (full address) state (state/province) neighborhood (area/district) phone (contact number) emails (email addresses) Copy your Google Sheets document ID for workflow configuration Requirements Apify account Google Sheets document Google OAuth credentials Customization Options For different use cases: Lead Gen: Get business leads Local SEO: Collect competitor data Market Research: Analyze industry trends Advanced mofications: Add email enrichment Integrate with CRM systems Set up automatic daily runs
by WeblineIndia
Smart Document Parser for Invoices, Logs or Sensor Reports (PDF/Image to Google Sheets) This n8n workflow automatically parses documents such as invoices, sensor logs or structured PDFs/images (including scanned docs or CSVs), extracts key fields like totals, dates and customer/vendor info using OCR and AI, and writes the structured output into Google Sheets. Who’s it for Finance or Ops teams automating invoice processing. SaaS platforms parsing uploaded reports or documents. Anyone needing a no-code backend for PDF/image/CSV document parsing. AI-powered data capture pipelines. How it works Webhook Trigger receives file uploads (/uploadDoc) Switch Node checks the file type: If image → Use Tesseract OCR If PDF → Use PDF parser If CSV → Extract as-is Extracted text is passed to: Google Gemini or Gemini Flash AI model Prompt extracts fields like invoice_id, total, customer_name, etc. JSON string is parsed and cleaned Data is appended to Google Sheets using appendOrUpdate How to set up Create a Google Sheet with columns like: invoice_id, invoice_date, due_date, customer_name, vendor_name, subtotal, tax_total, total, currency Connect: Google Sheets OAuth Google Gemini (PaLM API key) for LLM parsing Deploy the webhook endpoint: /uploadDoc Upload sample files (PDFs, images, CSVs) to test Review and map sheet columns in the Invoice Data node Requirements | Tool | Purpose | | ------------- | --------------------------------- | | n8n | Automation framework | | Google Sheets | To store structured output | | Tesseract OCR | For scanned image text extraction | | Google Gemini | For natural language parsing | How to customize Add extraction for line items using structured prompts. Change prompt to extract sensor readings, log types, or custom keys. Add support for other file types (e.g., XLSX, DOCX). Add Slack/Email notifications on success/failure. Swap Gemini with OpenAI or Hugging Face if preferred. Add‑ons Save uploaded files to Google Drive or S3 Add auth for secure uploads Use charting/dashboard nodes to visualize extracted data Integrate with billing/accounting software Use Case Examples | Scenario | What Happens | | ----------------------- | ------------------------------------------------------- | | Invoice Upload (PDF) | Extracts totals, customer, tax data into a Google Sheet | | Scanned Receipt (Image) | OCR + LLM extracts structured data | | Log File (CSV) | Parses and logs entries into Sheets | Common troubleshooting | Issue | Possible Cause | Solution | | --------------------------------- | ----------------------- | ------------------------------------------- | | Webhook not triggered | URL or method mismatch | Use correct POST URL /uploadDoc | | Text is blank | OCR failed | Check image quality or Tesseract config | | Gemini model not returning JSON | Prompt formatting issue | Ensure prompt ends with valid JSON schema | | Sheet not updated | Invalid Sheet ID or tab | Double-check sheet credentials and tab name | Need Help? Need help fine-tuning the Gemini prompt for better field accuracy? Want to extract full tables, multi-page invoices or convert PDFs to JSON lines? Our automation team at WeblineIndia can help you extend this into a full-blown document automation pipeline.
by Extruct AI
Who’s it for: Sales teams, marketers, and analysts who need to quickly access all the social media and public profile links for any company. How it works / What it does: When you enter a company into the form, this workflow automatically searches for and collects all available links to the company’s social media accounts, review sites, and public profiles from sources like Crunchbase and Zoominfo. All discovered URLs are added directly to your Google Sheet. How to set up: Create an Extruct account at www.extruct.ai/. Open the Extruct table template, find the table ID in your browser’s address bar, and copy it. Make a copy of the provided Google Sheets template to your own Google Drive. In n8n, paste the table ID into the variables node of your flow. Set up Bearer authentication in every HTTP Request node using your Extruct API token (found on the API page in Extruct). In the Google Sheets node, paste the link to your copied template and connect your Google account. Run the flow once to load the fields, then map the output fields to the correct columns in your sheet. Activate the flow and start adding companies via the form. Requirements: Extruct account and API token Extruct table template Google account with Google Sheets How to customize the workflow: You can add your own columns to the Extruct table and your Google Sheet. Just add the new column in both places and map it in the Google Sheets node in n8n.
by darrell_tw
How it works Fetch all workflows from your n8n instance. Filter workflows that contain nodes with a modelId setting. Extract the node names, model IDs, model names, workflow names, and workflow URLs. Save the extracted information into a connected Google Sheet. Set up steps Connect your n8n API credentials. Connect your Google Sheets account. Replace "Your n8n domain" with your actual domain URL. Use this Google Sheet template to create a new sheet for results. Setup typically takes 5 minutes. Be cautious: if you have over 100 workflows, performance may be impacted. Notes Sticky notes inside the workflow provide extra guidance. This workflow clears old sheet data before writing new results. Make sure your n8n instance allows API access. Result Example Update: It didn't detect the AI model in tool originally. Now it's fixed! Update 20250429: Support 1.91.0 with open node directly! Optimize the url with node id.
by Federico De Ponte
🔁 Loop & Optimize Meta Tags with Google Gemini This workflow automates the shortening of meta titles and descriptions for SEO—directly from your Google Sheet, row by row, using Google Gemini. ✅ What it does Reads rows from a Google Sheet (meta_title, meta_description, row_index) Loops through each row and checks if content exists Sends the data to Google Gemini for length-optimized output Cleans and parses the response Updates the original sheet with the shortened results 🛠️ Setup Requirements Google Sheets (OAuth2 credentials connected in n8n) Google Gemini API key (configured in n8n credentials) Sheet must contain: row_index meta_title meta_description Output will be written into: meta_titleFixed meta_descriptionFixed
by Ranjan Dailata
Who this is for? This workflow enables automated, scalable collection of high-quality, AI-ready data from websites using Bright Data’s Web Unlocker, with a focus on preparing that data for LLM training. Leveraging LLM Chains and AI agents, the system formats and extracts key information, then stores the structured embeddings in a Pinecone vector database. This workflow is tailored for: ML Engineers & Researchers building or fine-tuning domain-specific LLMs. AI Startups needing clean, structured content for product training. Data Teams preparing knowledge bases for enterprise-grade AI apps. LLM-as-a-Service Providers sourcing dynamic web content across niches. What problem is this workflow solving? Training a large language model (LLM) requires vast amounts of clean, relevant, and structured data. Manual collection is slow, error-prone, and lacks scalability. This workflow: Automatically extracts web data from specified URLs. Bypasses anti-bot measures using Bright Data’s Web Unlocker. Formats, cleans, and transforms raw content using LLM agents. Stores semantically searchable vectors in Pinecone. Makes datasets AI-ready for fine-tuning, RAG, or domain-specific training. What this workflow does This workflow automates the process of collecting, cleaning, and vectorizing web content to create structured, high-quality datasets that are ready to be used for LLM (Large Language Model) training or retrieval-augmented generation (RAG). Web Crawling with Bright Data Web Unlocker. AI Information Extraction and Data Formatting. AI Data Formatting to produce a JSON structured data. Persistence in Pinecone Vector DB. Handle Webhook notification of structured data. Setup Sign up at Bright Data. Navigate to Proxies & Scraping and create a new Web Unlocker zone by selecting Web Unlocker API under Scraping Solutions. In n8n, configure the Header Auth account under Credentials (Generic Auth Type: Header Authentication). The Value field should be set with the Bearer XXXXXXXXXXXXXX. The XXXXXXXXXXXXXX should be replaced by the Web Unlocker Token. A Google Gemini API key (or access through Vertex AI or proxy). Update the LinkedIn URL by navigating to the Set LinkedIn URL node. Update the Set Fields - URL and Webhook URL node with the URL for web data extraction and the Webhook notification URL. How to customize this workflow to your needs Set Your Target URLs. Target sites that are high-quality, domain-specific, and relevant to your LLM's purpose. Adjust Bright Data Web Unlocker Settings. Geo-location, Headers / User-Agent strings, Retry rules and proxies. Modify the Information Extraction Logic. Change prompts to extract specific attributes. Use structured templates or few-shot examples in prompts. Swap the Embedding Model. Use OpenAI, Hugging Face or other your own hosted embedding model API. Customize Pinecone Metadata Fields. Store extra fields in Pinecone for better filtering & semantic querying. Add Data Validation or Deduplication. Skip duplicates or low-quality content.
by Mutasem
Use Case This workflow aims to enrich new contacts in Intercom. The more relevant the Intercom profile, the more useful it is. Once active, this n8n workflow will update contact data (phone, email) as well as location data from ExactBuyer. Setup Add a webhook url in Intercom to call this workflow Add your Exact Buyer API key Add your Intercom API key Activate workflow How to adjust this template There's plenty of interesting info that ExactBuyer returns that could be helpful. Take a look and update this workflow to add what you need.
by Joachim Hummel
This n8n workflow automates posting Amazon affiliate products to Mastodon — complete with image upload, description, and a shortened tracking URL using Shlink. 🔧 How it works Input Source: The workflow starts by reading from a connected Google Sheet that contains: SHlink (Shortlink) Amazon Link Description (Optional) PicURL Send /NO or YES A Send column (used as a flag to check if the row was already posted) Image Upload: It fetches the product image via HTTP and uploads it directly to a Mastodon instance via the /media API endpoint. URL Shortening (Shlink): The original Amazon URL is shortened using your self-hosted or cloud-hosted Shlink instance to enable click tracking and better presentation. Text Generation: A two-line promotional text is automatically generated using a Language Model (LLM), based on the product description. Posting to Mastodon: The post is then published on Mastodon with: The image The generated text The shortened Shlink URL Row Update: Once published, the Send column in the Google Sheet is updated to "YES" to prevent duplicates. Requirements ✅ Shlink – Required for shortening and tracking Amazon URLs ✅ Google Sheet – Used as a product queue and post ✅ Google Sheet Example https://link.unixweb.home64.de/w7VqY ✅ Mastodon account – OAuth2 credentials with write scope ✅ Product image URL – Must be valid and accessible ✅ n8n credentials – Set up for Google Sheets, Mastodon, and optionally OpenRouter or other LLM providers This workflow is ideal for content creators, affiliate marketers, and automation fans who want to save time and optimize reach across the Fediverse. #affiliate #amazon #mastodon #advertisment
by ist00dent
This n8n template provides a simple yet powerful utility for validating if a given string input is a valid JSON format. You can use this to pre-validate data received from external sources, ensure data integrity before further processing, or provide immediate feedback to users submitting JSON strings. 🔧 How it works Webhook: This node acts as the entry point for the workflow, listening for incoming POST requests. It expects a JSON body with a single property: jsonString: The string that you want to validate as JSON. Code (JSON Validator): This node contains custom JavaScript code that attempts to parse the jsonString provided in the webhook body. If the jsonString can be successfully parsed, it means it's valid JSON, and the node returns an item with valid: true. If parsing fails, it catches the error and returns an item with valid: false and the specific error message. This logic is applied to each item passed through the node, ensuring all inputs are validated. Respond to Webhook: This node sends the validation result (either valid: true or valid: false with an error message) back to the service that initiated the webhook request. 👤 Who is it for? This workflow is ideal for: Developers & Integrators: Pre-validate JSON payloads from external systems (APIs, webhooks) before processing them in your workflows, preventing errors. Data Engineers: Ensure the integrity of JSON data before storing it in databases or data lakes. API Builders: Offer a dedicated endpoint for clients to test their JSON strings for validity. Customer Support Teams: Quickly check user-provided JSON configurations for errors. Anyone handling JSON data: A quick and easy way to programmatically check JSON string correctness without writing custom code in every application. 📑 Data Structure When you trigger the webhook, send a POST request with a JSON body structured as follows: { "jsonString": "{\"name\": \"n8n\", \"type\": \"workflow\"}" } Example of an invalid JSON string: { "jsonString": "{name: \"n8n\"}" // Missing quotes around 'name' } The workflow will return a JSON response indicating validity: For a valid JSON string: { "valid": true } For an invalid JSON string: { "valid": false, "error": "Unexpected token 'n', \"{name: \"n8n\"}\" is not valid JSON" } ⚙️ Setup Instructions Import Workflow: In your n8n editor, click "Import from JSON" and paste the provided workflow JSON. Configure Webhook Path: Double-click the Webhook node. In the 'Path' field, set a unique and descriptive path (e.g., /validate-json). Activate Workflow: Save and activate the workflow. 📝 Tips This JSON validator workflow is a solid starting point. Consider these enhancements: Enhanced Error Feedback: Upgrade: Add a Set node after the Code node to format the error message into a more user-friendly string before responding. Leverage: Make it easier for the caller to understand the issue. Logging Invalid Inputs: Upgrade: After the Code node, add an IF node to check if valid is false. If so, branch to a node that logs the invalid jsonString and error to a Google Sheet, database, or a logging service. Leverage: Track common invalid inputs for debugging or improvement. Transforming Valid JSON: Upgrade: If the JSON is valid, you could add another Function node to parse the jsonString and then operate on the parsed JSON data directly within the workflow. Leverage: Use this validator as the first step in a larger workflow that processes JSON data. Asynchronous Validation: Upgrade: For very large JSON strings or high-volume requests, consider using a separate queueing mechanism (e.g., RabbitMQ, SQS) and an asynchronous response pattern. Leverage: Prevent webhook timeouts and improve system responsiveness.
by Airtop
README Automating Video File Download from Sample.cat with Airtop.ai Use Case Automating file downloads from web pages is useful for scenarios like bulk media retrieval, dataset access, or recurring content backups. This workflow ensures a hands-free, consistent process for retrieving downloadable content. What This Automation Does This automation performs a reliable download of a video file from a specified webpage using the following steps: Initiates an Airtop browser session. Opens a specified URL containing downloadable media. Interacts with the page to click the download button. Waits for the file to be processed and made available. Retrieves metadata to confirm availability. Downloads the file. Terminates the browser session to clean up resources. How It Works Manual Trigger: Activated by user test. Session: Starts an Airtop browser session. Window: Navigates to https://sample.cat/en/webm. Interaction: Simulates a click on the download button for the video titled “SD 640x360 (Seawater, drone view video, 30 FPS)”. Wait: Pauses for 10 seconds to allow the file to be ready for download. Get File Data: Checks for downloadable files in the session. Download File: Retrieves the file using its ID. Terminate: Ends the browser session to free up resources. Setup Requirements Airtop API Key — required to authenticate API calls. Next Steps Enhance with Retry Logic**: Loop file availability check until status = available for more robust automation. Customize File Targets**: Dynamically pass URLs and button descriptors for multi-source downloads. Connect to Storage**: Pipe downloaded files to cloud storage or databases for archiving. Read more about automating file downloads from the web