π‘π Essential Multipage Website Scraper with Jina.ai
π‘π Essential Multipage Website Scraper with Jina.ai
Use responsibly and follow local rules and regulations
This N8N workflow enables automated multi-page website scraping using Jina.ai's powerful web scraping capabilities, with seamless integration to Google Drive for content storage. Here's how it works:
Main Features The workflow automatically scrapes multiple pages from a website's sitemap and saves each page's content as a separate Google Drive document.
Key Components Input Configuration Starts with a sitemap URL (default: https://ai.pydantic.dev/sitemap.xml)** Processes the sitemap to extract individual page URLs Includes filtering options to target specific topics or pages
Scraping Process Uses Jina.ai's web scraper to extract content from each URL Converts webpage content into clean markdown format Extracts page titles automatically for document naming
Storage Integration Creates individual Google Drive documents for each scraped page Names documents using the format "URL - Page Title" Saves content in markdown format for better readability
Usage Instructions Set your target website's sitemap URL in the "Set Website URL" node Configure the "Filter By Topics or Pages" node to select specific content Adjust the "Limit" node (default: 20 pages) to control batch size Connect your Google Drive account Run the workflow to begin automated scraping
Additional Features Built-in rate limiting through the Wait node to prevent overloading servers Batch processing capability for handling large sitemaps
The workflow requires no API key for Jina.ai, making it accessible for immediate use while maintaining responsible scraping practices.
Related Templates
Send structured logs to BetterStack from any workflow using HTTP Request
Send structured logs to BetterStack from any workflow using HTTP Request Who is this for? This workflow is perfect for...
Provide latest euro exchange rates from European Central Bank via Webhook
What is this workflow doing? This simple workflow is pulling the latest Euro foreign exchange reference rates from the E...
Convert Tour PDFs to Vector Database using Google Drive, LangChain & OpenAI
π§© Workflow: Process Tour PDF from Google Drive to Pinecone Vector DB with OpenAI Embeddings Overview This workflow au...
π Please log in to import templates to n8n and favorite templates
Workflow Visualization
Loading...
Preparing workflow renderer
Comments (0)
Login to post comments