πŸ’‘πŸŒ Essential Multipage Website Scraper with Jina.ai

πŸ’‘πŸŒ Essential Multipage Website Scraper with Jina.ai

Use responsibly and follow local rules and regulations

This N8N workflow enables automated multi-page website scraping using Jina.ai's powerful web scraping capabilities, with seamless integration to Google Drive for content storage. Here's how it works:

Main Features The workflow automatically scrapes multiple pages from a website's sitemap and saves each page's content as a separate Google Drive document.

Key Components Input Configuration Starts with a sitemap URL (default: https://ai.pydantic.dev/sitemap.xml)** Processes the sitemap to extract individual page URLs Includes filtering options to target specific topics or pages

Scraping Process Uses Jina.ai's web scraper to extract content from each URL Converts webpage content into clean markdown format Extracts page titles automatically for document naming

Storage Integration Creates individual Google Drive documents for each scraped page Names documents using the format "URL - Page Title" Saves content in markdown format for better readability

Usage Instructions Set your target website's sitemap URL in the "Set Website URL" node Configure the "Filter By Topics or Pages" node to select specific content Adjust the "Limit" node (default: 20 pages) to control batch size Connect your Google Drive account Run the workflow to begin automated scraping

Additional Features Built-in rate limiting through the Wait node to prevent overloading servers Batch processing capability for handling large sitemaps

The workflow requires no API key for Jina.ai, making it accessible for immediate use while maintaining responsible scraping practices.

0
Downloads
15071
Views
8.44
Quality Score
beginner
Complexity
Author:Joseph LePage(View Original β†’)
Created:8/14/2025
Updated:11/17/2025

πŸ”’ Please log in to import templates to n8n and favorite templates

Workflow Visualization

Loading...

Preparing workflow renderer

Comments (0)

Login to post comments