Scrape and store data from multiple website pages

Name: Scrape and store data from multiple website pages
Availability: InStock
Rating: 0.4 (1 reviews)
Author: Miquel Colomer

This workflow allows extracting data from multiple pages website.

The workflow:

Starts in a country list at https://www.theswiftcodes.com/browse-by-country/.
Loads every country page (https://www.theswiftcodes.com/albania/)
Paginates every page in the country page.
Extracts data from the country page.
Saves data to MongoDB.
Paginates through all pages in all countries.

It uses getWorkflowStaticData('global') method to recover the next page (saved from the previous page), and it goes ahead with all the pages.

There is a first section where the countries list is recovered and extracted.

Later, I try to read if a local cache page is available and I recover the cached page from the disk.

Finally, I save data to MongoDB, and we paginate all the pages in the country and for all the countries.

I have applied a cache system to save a visited page to n8n local disk. If I relaunch workflow, we check if a cache file exists to discard non-required requests to the webpage.

If the data present in the website changes, you can apply a Cron node to check the website once per week.

Finally, before inserting data in MongoDB, the best way to avoid duplicates is to check that swift_code (the primary value of the collection) doesn't exist.

I recommend using a proxy for all requests to avoid IP blocks. A good solution for proxy plus IP rotation is scrapoxy.io.

This workflow is perfect for small data requirements. If you need to scrape dynamic data, you can use a Headless browser or any other service.

If you want to scrape huge lists of URIs, I recommend using Scrapy + Scrapoxy.

0

Downloads

80009

Views

8.44

Quality Score

beginner

Complexity

Category:Data Processing

Author:Miquel Colomer(View Original →)

Created:8/14/2025

Updated:5/18/2026

Related Templates

Restore your workflows from GitHub

This workflow restores all n8n instance workflows from GitHub backups using the n8n API node. It complements the Backup ...

Data Processing2 downloads

Build a Restaurant Voice Assistant with VAPI and PostgreSQL for Bookings & Orders

This n8n template demonstrates how to create a comprehensive voice-powered restaurant assistant that handles table reser...

Data Processing6 downloads

Extract Named Entities from Web Pages with Google Natural Language API

Who is this for? Content strategists analyzing web page semantic content SEO professionals conducting entity-based anal...

Scrape and store data from multiple website pages

Tags

Related Templates

Restore your workflows from GitHub

Build a Restaurant Voice Assistant with VAPI and PostgreSQL for Bookings & Orders

Extract Named Entities from Web Pages with Google Natural Language API

Workflow Visualization

Loading...

Comments (0)