by Onur
Yelp Business Scraper by URL via Scrape.do API with Google Sheets Storage Overview This n8n workflow automates the process of scraping comprehensive business information from Yelp using individual business URLs. It integrates with Scrape.do for professional web scraping with anti-bot bypass capabilities and Google Sheets for centralized data storage, providing detailed business intelligence for market research, competitor analysis, and lead generation. Workflow Components 1. 📥 Form Trigger | Property | Value | |----------|-------| | Type | Form Trigger | | Purpose | Initiates the workflow with user-submitted Yelp business URL | | Input Fields | Yelp Business URL | | Function | Captures target business URL to start the scraping process | 2. 🔍 Create Scrape.do Job | Property | Value | |----------|-------| | Type | HTTP Request (POST) | | Purpose | Creates an async scraping job via Scrape.do API | | Endpoint | https://q.scrape.do/api/v1/jobs | | Authentication | X-Token header | Request Parameters: Targets**: Array containing the Yelp business URL Super**: true (uses residential/mobile proxies for better success rate) GeoCode**: us (targets US-based content) Device**: desktop Render**: JavaScript rendering enabled with networkidle2 wait condition Function: Initiates comprehensive business data extraction from Yelp with headless browser rendering to handle dynamic content. 3. 🔧 Parse Yelp HTML | Property | Value | |----------|-------| | Type | Code Node (JavaScript) | | Purpose | Extracts structured business data from raw HTML | | Mode | Run once for each item | Function: Parses the scraped HTML content using regex patterns and JSON-LD extraction to retrieve: Business name Overall rating Review count Phone number Full address Price range Categories Website URL Business hours Image URLs 4. 📊 Store to Google Sheet | Property | Value | |----------|-------| | Type | Google Sheets Node | | Purpose | Stores scraped business data for analysis and storage | | Operation | Append rows | | Target | "Yelp Scraper Data - Scrape.do" sheet | Data Mapping: Business Name, Overall Rating, Reviews Count Business URL, Phone, Address Price Range, Categories, Website Hours, Images/Videos URLs, Scraped Timestamp Workflow Flow Form Input → Create Scrape.do Job → Parse Yelp HTML → Store to Google Sheet │ │ │ │ ▼ ▼ ▼ ▼ User submits API creates job JavaScript code Data appended Yelp URL with JS rendering extracts fields to spreadsheet Configuration Requirements API Keys & Credentials | Credential | Purpose | |------------|---------| | Scrape.do API Token | Required for Yelp business scraping with anti-bot bypass | | Google Sheets OAuth2 | For data storage and export access | | n8n Form Webhook | For user input collection | Setup Parameters | Parameter | Description | |-----------|-------------| | YOUR_SCRAPEDO_TOKEN | Your Scrape.do API token (appears in 3 places) | | YOUR_GOOGLE_SHEET_ID | Target spreadsheet identifier | | YOUR_GOOGLE_SHEETS_CREDENTIAL_ID | OAuth2 authentication reference | Key Features 🛡️ Anti-Bot Bypass Technology Residential Proxy Rotation**: 110M+ proxies across 150 countries WAF Bypass**: Handles Cloudflare, Akamai, DataDome, and PerimeterX Dynamic TLS Fingerprinting**: Authentic browser signatures CAPTCHA Handling**: Automatic bypass for uninterrupted scraping 🌐 JavaScript Rendering Full headless browser support for dynamic Yelp content networkidle2 wait condition ensures complete page load Custom wait times for complex page elements Real device fingerprints for detection avoidance 📊 Comprehensive Data Extraction | Field | Description | Example | |-------|-------------|---------| | name | Business name | "Joe's Pizza Restaurant" | | overall_rating | Average customer rating | "4.5" | | reviews_count | Total number of reviews | "247" | | url | Original Yelp business URL | "https://www.yelp.com/biz/..." | | phone | Business phone number | "(555) 123-4567" | | address | Full street address | "123 Main St, New York, NY 10001" | | price_range | Price indicator | "$$" | | categories | Business categories | "Pizza, Italian, Delivery" | | website | Business website URL | "https://joespizza.com" | | hours | Operating hours | "Mon-Fri 11:00-22:00" | | images_videos_urls | Media content links | "https://s3-media1.fl.yelpcdn.com/..." | | scraped_at | Extraction timestamp | "2025-01-15T10:30:00Z" | 🗂️ Centralized Data Storage Automatic Google Sheets export Organized business data format with 12 data fields Historical scraping records with timestamps Easy sharing and collaboration Use Cases 📈 Market Research Competitor business analysis Local market intelligence gathering Industry benchmark establishment Service offering comparison 🎯 Lead Generation Business contact information extraction Potential client identification Market opportunity assessment Sales prospect development 📊 Business Intelligence Customer sentiment analysis through ratings Competitor performance monitoring Market positioning research Brand reputation tracking 📍 Location Analysis Geographic business distribution Local competition assessment Market saturation evaluation Expansion opportunity identification Technical Notes | Specification | Value | |--------------|-------| | Processing Time | 15-45 seconds per business URL | | Data Accuracy | 95%+ for publicly available business information | | Success Rate | 99.98% (Scrape.do guarantee) | | Proxy Pool | 110M+ residential, mobile, and datacenter IPs | | JS Rendering | Full headless browser with networkidle2 wait | | Data Format | JSON with structured field mapping | | Storage Format | Structured Google Sheets with 12 predefined columns | Setup Instructions Step 1: Import Workflow Copy the JSON workflow configuration Import into n8n: Workflows → Import from JSON Paste configuration and save Step 2: Configure Scrape.do Get your API token: Sign up at Scrape.do Navigate to Dashboard → API Token Copy your token Update workflow references (3 places): 🔍 Create Scrape.do Job node → Headers → X-Token 📡 Check Job Status node → Headers → X-Token 📥 Fetch Task Results node → Headers → X-Token Replace YOUR_SCRAPEDO_TOKEN with your actual API token. Step 3: Configure Google Sheets Create target spreadsheet: Create new Google Sheet named "Yelp Business Data" or similar Add header row with columns: name | overall_rating | reviews_count | url | phone | address | price_range | categories | website | hours | images_videos_urls | scraped_at Copy the Sheet ID from URL (the long string between /d/ and /edit) Set up OAuth2 credentials: In n8n: Credentials → Add Credential → Google Sheets OAuth2 Complete the Google authentication process Grant access to Google Sheets Update workflow references: Replace YOUR_GOOGLE_SHEET_ID with your actual Sheet ID Update YOUR_GOOGLE_SHEETS_CREDENTIAL_ID with credential reference Step 4: Test and Activate Test with sample URL: Use a known Yelp business URL (e.g., https://www.yelp.com/biz/example-business-city) Submit through the form trigger Monitor execution progress in n8n Verify data appears in Google Sheet Activate workflow: Toggle workflow to "Active" Share form URL with users Sample Business Data The workflow captures comprehensive business information including: | Category | Data Points | |----------|-------------| | Basic Information | Name, category, location | | Performance Metrics | Ratings, review counts, popularity | | Contact Details | Phone, website, address | | Visual Content | Photos, videos, gallery URLs | | Operational Data | Hours, services, price range | Advanced Configuration Batch Processing Modify the input to accept multiple URLs by updating the job creation body: { "Targets": [ "https://www.yelp.com/biz/business-1", "https://www.yelp.com/biz/business-2", "https://www.yelp.com/biz/business-3" ], "Super": true, "GeoCode": "us", "Render": { "WaitUntil": "networkidle2", "CustomWait": 3000 } } Enhanced Rendering Options For complex Yelp pages, add browser interactions: { "Render": { "BlockResources": false, "WaitUntil": "networkidle2", "CustomWait": 5000, "WaitSelector": ".biz-page-header", "PlayWithBrowser": [ { "Action": "Scroll", "Direction": "down" }, { "Action": "Wait", "Timeout": 2000 } ] } } Notification Integration Add alert mechanisms: Email notifications for completed scrapes Slack messages for team updates Webhook triggers for external systems Error Handling Common Issues | Issue | Cause | Solution | |-------|-------|----------| | Invalid URL | URL is not a valid Yelp business page | Ensure URL format: https://www.yelp.com/biz/... | | 401 Unauthorized | Invalid or missing API token | Verify X-Token header value | | Job Timeout | Page too complex or slow | Increase CustomWait value | | Empty Data | HTML parsing failed | Check page structure, update regex patterns | | Rate Limiting | Too many concurrent requests | Reduce request frequency or upgrade plan | Troubleshooting Steps Verify URLs: Ensure Yelp business URLs are correctly formatted Check Credentials: Validate Scrape.do token and Google OAuth Monitor Logs: Review n8n execution logs for detailed errors Test Connectivity: Verify network access to all external services Check Job Status: Use Scrape.do dashboard to monitor job progress Performance Specifications | Metric | Value | |--------|-------| | Processing Time | 15-45 seconds per business URL | | Data Accuracy | 95%+ for publicly available information | | Success Rate | 99.98% (with Scrape.do anti-bot bypass) | | Concurrent Processing | Depends on Scrape.do plan limits | | Storage Capacity | Unlimited (Google Sheets based) | | Proxy Pool | 110M+ IPs across 150 countries | Scrape.do API Reference Async API Endpoints | Endpoint | Method | Purpose | |----------|--------|---------| | /api/v1/jobs | POST | Create new scraping job | | /api/v1/jobs/{jobID} | GET | Check job status | | /api/v1/jobs/{jobID}/{taskID} | GET | Retrieve task results | | /api/v1/me | GET | Get account information | Job Status Values | Status | Description | |--------|-------------| | queuing | Job is being prepared | | queued | Job is in queue waiting to be processed | | pending | Job is currently being processed | | rotating | Job is retrying with different proxies | | success | Job completed successfully | | error | Job failed | | canceled | Job was canceled by user | For complete API documentation, visit: Scrape.do Documentation Support & Resources Scrape.do Documentation**: https://scrape.do/documentation/ Scrape.do Dashboard**: https://dashboard.scrape.do/ n8n Documentation**: https://docs.n8n.io/ Google Sheets API**: https://developers.google.com/sheets/api This workflow is powered by Scrape.do - Reliable, Scalable, Unstoppable Web Scraping
by Max Tkacz
Easily generate images with Black Forest's Flux Text-to-Image AI models using Hugging Face’s Inference API. This template serves a webform where you can enter prompts and select predefined visual styles that are customizable with no-code. The workflow integrates seamlessly with Hugging Face's free tier, and it’s easy to modify for any Text-to-Image model that supports API access. Try it Curious what this template does? Try a public version here: https://devrel.app.n8n.cloud/form/flux Set Up Watch this quick set up video 👇 Accounts required Huggingface.co account (free) Cloudflare.com account (free - used for storage; but can be swapped easily e.g. GDrive) Key Features: Text-to-Image Creation**: Generates unique visuals based on your prompt and style. Hugging Face Integration**: Utilizes Hugging Face’s Inference API for reliable image generation. Customizable Visual Styles**: Select from preset styles or easily add your own. Adaptable**: Swap in any Hugging Face Text-to-Image model that supports API calls. Ideal for: Creators**: Rapidly create visuals for projects. Marketers**: Prototype campaign visuals. Developers**: Test different AI image models effortlessly. How It Works: You submit an image prompt via the webform and select a visual style, which appends style instructions to your prompt. The Hugging Face Inference API then generates and returns the image, which gets hosted on Cloudflare S3. The workflow can be easily adjusted to use other models and styles for complete flexibility.
by Mutasem
Use case This workflow snoozes any Todoist tasks, by moving them into a Snoozed todoist list and unsnoozes them 3 days before due date. Helps keep inbox clear only of tasks you need to worry about soon. How to setup Add your Todoist creds Create a Todoist project called snoozed Set the project ids in the relevant nodes Add due dates to your tasks in Inbox. Watch them disappear to snoozed. Set their date to tomorrow, watch it return to inbox. How to adjust this template Adjust the timeline.. Maybe 3 days is too close for you. Works mostly for me :)
by n8n Team
This n8n workflow automates the monitoring and notification of Palo Alto Networks security advisories. It is triggered manually from within the n8n UI or scheduled to run daily at midnight using the Schedule Trigger. The workflow begins by fetching the latest security advisories from Palo Alto Networks' RSS feed. Each advisory is then processed, and relevant information is extracted and categorized, including the advisory type, subject, and severity. The workflow checks the publication date of each advisory to ensure that it was posted within the last 24 hours, filtering out older advisories. The workflow then splits into two paths based on the advisory type: GlobalProtect and Traps. In the GlobalProtect path, advisories related to GlobalProtect are identified and used to create Jira issues. The Jira issues include a summary with the advisory title and a description that provides details about the advisory, its severity, link, and publication date. In the Traps path, advisories related to Traps are recognized, and dummy data (which should be replaced with logic to retrieve valid user emails) is generated for sample purposes. These email addresses are then used to send email notifications using the Gmail node. Each email's subject includes the type of advisory, while the body contains the advisory title and a link for more information. Potential issues when setting up this workflow for the first time might involve configuring the Schedule Trigger to match the desired time zone. Additionally, ensuring that the Jira and Gmail nodes are configured correctly with the required credentials and email addresses is crucial. The placeholder for generating dummy data for email recipients should be replaced with logic to retrieve valid user emails. Proper error handling and testing with real and sample advisories can help identify and resolve any potential issues during setup.
by Robert Breen
Create multi-sheet Excel workbooks in n8n to automate reporting using Google Drive + Google Sheets Build an automated Excel file with multiple tabs directly in n8n. Two Code nodes generate datasets, each is converted into its own Excel worksheet, then combined into a single .xlsx and (optionally) appended to a Google Sheet for sharing—eliminating manual copy-paste and speeding up reporting. Who’s it for Teams that publish recurring reports as Excel with multiple tabs Ops/Marketing/Data folks who want a no-code/low-code way to package JSON into Excel n8n beginners learning the Code → Convert to File → Merge pattern How it works Manual Trigger starts the run. Code nodes emit JSON rows for each table (e.g., People, Locations). Convert to File nodes turn each JSON list into an Excel binary, assigning Sheet1/Sheet2 (or your names). Merge combines both binaries into a single Excel workbook with multiple tabs. Google Sheets (optional) appends the JSON rows to a live spreadsheet for collaboration. Setup (only 2 connections) 1️⃣ Connect Google Sheets (OAuth2) In n8n → Credentials → New → Google Sheets (OAuth2) Sign in with your Google account and grant access Copy the example sheet referenced in the Google Sheets node (open the node and duplicate the linked sheet), or select your own In the workflow’s Google Sheets node, select your Spreadsheet and Worksheet https://docs.google.com/spreadsheets/d/1G6FSm3VdMZt6VubM6g8j0mFw59iEw9npJE0upxj3Y6k/edit?gid=1978181834#gid=1978181834 2️⃣ Connect Google Drive (OAuth2) In n8n → Credentials → New → Google Drive (OAuth2) Sign in with the Google account that will store your Excel outputs and allow access In your Drive-related nodes (if used), point to the folder where you want the .xlsx saved or retrieved Customize the workflow Replace the sample arrays in the Code nodes with your data (APIs, DBs, CSVs, etc.) Rename sheetName in each Convert to File node to match your desired tab names Keep the Merge node in Combine All mode to produce a single workbook In Google Sheets, switch to Manual mapping for strict column order (optional) Best practices (per template guidelines) Rename nodes** to clear, action-oriented names (e.g., “Build People Sheet”, “Build Locations Sheet”) Add a yellow Sticky Note at the top with this description so users see setup in-workflow Do not hardcode credentials** inside HTTP nodes; always use n8n Credentials Remove personal IDs/links before publishing Sticky Note (copy-paste) > Multi-Tab Excel Builder (Google Drive + Google Sheets) > This workflow generates two datasets (Code → JSON), converts each to an Excel sheet, merges them into a single workbook with multiple tabs, and optionally appends rows to Google Sheets. > > Setup (2 connections): > 1) Google Sheets (OAuth2): Create credentials → duplicate/select your target spreadsheet → set Spreadsheet + Worksheet in the node. > 2) Google Drive (OAuth2): Create credentials → choose the folder for storing/retrieving the .xlsx. > > Customize: Edit the Code nodes’ arrays, rename tab names in Convert to File, and adjust the Sheets node mapping as needed. Troubleshooting Missing columns / wrong order:* Use *Manual mapping** in the Google Sheets node Binary not found:* Ensure each *Convert to File* node’s binaryPropertyName matches what *Merge** expects Permissions errors:** Re-authorize Google credentials; confirm you have edit access to the target Sheet/Drive folder 📬 Contact Need help customizing this (e.g., filtering by campaign, sending reports by email, or formatting your PDF)? 📧 rbreen@ynteractive.com 🔗 https://www.linkedin.com/in/robert-breen-29429625/ 🌐 https://ynteractive.com
by Simeon
Google Calendar MCP – Context-Aware Calendar Operations This n8n template implements an MCP (Model Context Protocol)-compliant module for managing Google Calendar events in a context-aware, conflict-free manner. 🧠 What It Does This MCP enables structured interaction with Google Calendar based on context and intent, ensuring reliable, reusable operations with awareness of existing data and state. ✅ Core Capabilities Context-aware event creation** Prevents overlapping by validating time availability before creating new events. Gap validation** Checks if a time range is busy or free, enabling smarter scheduling decisions. Conditional updates** Only updates events after confirming their existence and current state. Safe deletion** Removes events using MCP principles of validation and traceability. 🚀 How to Use To use this MCP in your context-aware systems: Deploy the template in your n8n instance. Locate the Server node in the workflow — it exposes a Server-Sent Events (SSE) URL. Copy that SSE URL. Use that URL as the entry point for your MCP client or orchestrator. This URL acts as the communication bridge, allowing you to interact with the MCP-compliant Google Calendar logic using standard MCP semantics.
by Stéphane Heckel
Emailing Using Google Sheet, Google Docs, and SMTP Automate personalized email campaigns using a Google Sheets contact list, a Google Docs template, and SMTP delivery. How It Works Google Docs** is used as the email template with variables: {{firstname}}, {{lastname}}, {{company}}, {{email}}. Google Sheet** contains your list of recipients (one per row). For each contact, the workflow merges personal data into the Google Docs template. Email is sent to each recipient via SMTP (batch size: 1). Use the Wait node to respect provider quotas. After sending, the workflow updates the "process" column of the Google Sheet with the date/time. How to Use Copy Templates: Google Docs Template Google Sheet Template Find each document’s ID (the text after /d/ and before /edit in the URL). Configure Workflow: Enter your Google Docs and Google Sheets IDs in the settings node. Set your email subject in the appropriate parameter. Set Up Credentials: Connect your Google account. Configure the SMTP node with your mail server details. Update Data: Edit the Google Docs template with your message and variables. Prepare your Google Sheet with these columns: email, firstname, lastname, company. Deploy and Test: Connect all nodes. Test with a small contact batch. Troubleshoot any node errors (indicated in red in n8n). Requirements Google Credentials & permissions**: For Sheets and Docs access. SMTP Server**: For email delivery (adjust Wait node for rate limits). n8n Version**: Tested on 1.105.2 (Ubuntu). Need Help? Contact me on LinkedIn or ask in the n8n Community Forum!
by Usman Liaqat
This workflow enables seamless, bidirectional communication between WhatsApp and Slack using n8n. It automates the reception, processing, and forwarding of messages (text, media, and documents) between users on WhatsApp and private Slack channels. Key Features & Flow: 1. WhatsApp to Slack Flow Trigger: The workflow starts with a WhatsApp Trigger node that listens for new incoming messages via a webhook. Channel Handling: It checks if a Slack channel with the WhatsApp sender’s number exists If not, it creates a private Slack channel with the sender's number as the name. Message Type Routing: A Switch Node (Message Type) inspects the message type (text, image, audio, document). Based on type: Text: Sends the message directly to Slack. Image/Audio/Document: Retrieves media URL via WhatsApp API → downloads the media → uploads it to the appropriate Slack channel. 2. Slack to WhatsApp Flow Trigger: A Slack Trigger listens for new messages or file uploads in Slack. Message Type Routing: A second Switch Node (Checking Message Type) checks if the message is text or media. Routing Logic: Text Message: Extracts and forwards it to the WhatsApp contact (identified by the Slack channel name). Media/File Message: Retrieves media file URL from Slack → downloads it → sends it as a document via WhatsApp API. Key Integrations: WhatsApp Cloud API: For receiving messages, downloading media, and sending messages. Slack API: For creating/getting channels, posting messages, and uploading files. HTTP Request Node: Used to securely download media from Slack and WhatsApp servers with proper authentication. Automation Use Case: This workflow is ideal for businesses that handle customer support or conversations over WhatsApp and wish to log, respond, and collaborate using Slack as their internal communication tool.
by Nabin Bhandari
This template uses VAPI and Cal.com to book appointments through a voice conversation. It detects whether the user wants to check availability or book an appointment, then responds naturally with real-time scheduling options. Who is this for? This workflow is perfect for: Voice assistant developers AI receptionists and smart concierge tools Service providers (salons, clinics, coaches) needing hands-free scheduling Anyone building voice-based customer experiences What does it do? This workflow turns a natural voice conversation into a working appointment system. It starts with a Webhook connected to your VAPI voice agent. The Set node extracts user intent (like “check availability” or “book now”). A Switch node branches logic based on the intent. If the user wants to check availability, the workflow fetches available times from Cal.com. If the user wants to book, it creates a new event using Cal.com's API. The final result is sent back to VAPI as a conversational voice response. How to use it Import this workflow into your n8n instance. Set up a Webhook node and connect it to your VAPI voice agent. Add your Cal.com API token as a credential (use HTTP Header Auth). Deploy and test using VAPI’s simulator or real phone input. (Optional) Customize the OpenAI prompt if you're using it to process or moderate inputs. Requirements A working VAPI agent A Cal.com account with API access n8n (cloud or self-hosted) An understanding of how to configure webhook and API credentials in n8n Customization Ideas Swap out Cal.com with another booking API (like Calendly) Add a Google Sheets or Supabase node to log appointments Use OpenAI to summarize or sanitize voice inputs before proceeding Build multi-turn conversations in VAPI for more complex bookings
by Sidetool
This workflow is a supporting automation to a common Airtable situation, that as of this writing, has no direct solution but has great demand. Interfaces are your secret weapon for managing a variety of tasks – from sales funnels and task tracking to creating dynamic dashboards. But here's a common situation: how do you efficiently bulk upload records (like contacts, leads, or clients) from an interface with just a click? Once set up, you'll be able to upload CSV files directly to your tables from the Interfaces with ease. Workflow Key Points: 1. Bulk Upload Functionality: Say goodbye to the limitations of standard Airtable interfaces. Now, you can upload multiple leads or contacts simultaneously, making your work swift and efficient. 2. Customizable Fields: Tailor the base to meet your specific data needs. This ensures seamless integration with your existing systems and simplifies data management. Perfect for teams in e-commerce, CRM, or any sector where managing a high volume of leads or contacts is key. Our Airtable Base is designed to eliminate the tediousness of importing contacts. It makes large-scale data management straightforward, saving you precious time and hassle. Get ready to streamline your operations and boost your productivity! 🚀💡
by Sk developer
🚀 LinkedIn Video to MP4 Automation with Google Drive & Sheets | RapidAPI Integration This n8n workflow automatically converts LinkedIn video URLs into downloadable MP4 files using the LinkedIn Video Downloader API, uploads them to Google Drive with public access, and logs both the original URL and Google Drive link into Google Sheets. It leverages the LinkedIn Video Downloader service for fast and secure video extraction. 📝 Node Explanations (Single-Line) 1️⃣ On form submission → Captures LinkedIn video URL from the user via a web form. 2️⃣ HTTP Request → Calls LinkedIn Video Downloader to fetch downloadable MP4 links. 3️⃣ If → Checks for API errors and routes workflow accordingly. 4️⃣ Download mp4 → Downloads the MP4 video file from the API response URL. 5️⃣ Upload To Google Drive → Uploads the downloaded MP4 file to Google Drive. 6️⃣ Google Drive Set Permission → Makes the uploaded file publicly accessible. 7️⃣ Google Sheets → Logs successful conversions with LinkedIn URL and sharable Drive link. 8️⃣ Wait → Delays execution before logging failed attempts. 9️⃣ Google Sheets Append Row → Logs failed video downloads with N/A Drive link. 📄 Google Sheets Columns URL** → Original LinkedIn video URL entered in the form. Drive_URL** → Publicly sharable Google Drive link to the converted MP4 file. (For failed downloads) → Drive_URL will display N/A. 💡 Use Case Automate LinkedIn video downloading and sharing using LinkedIn Video Downloader for social media managers, marketers, and content creators without manual file handling. ✅ Benefits Time-saving* (auto-download & upload), *Centralized tracking* in Sheets, *Easy sharing* via Drive links, and *Error logging* for failed downloads—all powered by *RapidAPI LinkedIn Video Downloader**.
by Miquel Colomer
This workflow allows extracting data from multiple pages website. The workflow: 1) Starts in a country list at https://www.theswiftcodes.com/browse-by-country/. 2) Loads every country page (https://www.theswiftcodes.com/albania/) 3) Paginates every page in the country page. 4) Extracts data from the country page. 5) Saves data to MongoDB. 6) Paginates through all pages in all countries. It uses getWorkflowStaticData('global') method to recover the next page (saved from the previous page), and it goes ahead with all the pages. There is a first section where the countries list is recovered and extracted. Later, I try to read if a local cache page is available and I recover the cached page from the disk. Finally, I save data to MongoDB, and we paginate all the pages in the country and for all the countries. I have applied a cache system to save a visited page to n8n local disk. If I relaunch workflow, we check if a cache file exists to discard non-required requests to the webpage. If the data present in the website changes, you can apply a Cron node to check the website once per week. Finally, before inserting data in MongoDB, the best way to avoid duplicates is to check that swift_code (the primary value of the collection) doesn't exist. I recommend using a proxy for all requests to avoid IP blocks. A good solution for proxy plus IP rotation is scrapoxy.io. This workflow is perfect for small data requirements. If you need to scrape dynamic data, you can use a Headless browser or any other service. If you want to scrape huge lists of URIs, I recommend using Scrapy + Scrapoxy.