Orchestrate Web Crawls with Scrapyd and Automated Data Enrichment

Name: Orchestrate Web Crawls with Scrapyd and Automated Data Enrichment
Availability: InStock
Rating: 0.4 (1 reviews)
Author: 21CEL

How it works This workflow runs a spider job in the background via Scrapyd, using a YAML config that defines selectors and parsing rules. When triggered, it schedules the spider with parameters (query, project ID, page limits, etc.). The workflow polls Scrapyd until the job finishes. Once complete, it fetches the output items, enriches them (parse JSONL, deduplicate, extract ID/part number/make/model/part name, normalize price), sorts results, and returns structured JSON. Optional debug outputs such as logs, HTML dumps, and screenshots are also collected.

How to use Use the manual trigger for testing, or replace it with webhook, schedule, or other triggers. Adjust the runtime parameters (q, project_id, pages, etc.) directly in the workflow when running. The background spider config (YAML and spider code) must be updated separately — this workflow only orchestrates and enriches results, it does not define scraping logic.

Requirements

Scrapyd service for job scheduling & status tracking

A deployed spider with a valid YAML config (adjust selectors there)

JSON Lines output (items.jl) from the spider

Endpoints for optional artifacts (logs, HTML, screenshots)

n8n with HTTP, Wait, Code, and Aggregate nodes enabled

Customising this workflow

Update the YAML config if the target website structure changes

Modify the enrichment code to extract different fields (e.g., categories, ratings)

Adjust deduplication (cheapest, newest, or other logic)

Toggle debug retrieval depending on performance/storage needs

Extend webhook response to integrate with databases, APIs, or downstream workflows

0

Downloads

0

Views

7.58

Quality Score

beginner

Complexity

Category:Data Processing

Author:21CEL(View Original →)

Created:9/19/2025

Updated:11/20/2025

Related Templates

USDT And TRC20 Wallet Tracker API Workflow for n8n

Overview This n8n workflow is specifically designed to monitor USDT TRC20 transactions within a specified wallet. It u...

Data Processing0 downloads

Automate Daily Keyword Research with Google Sheets, Suggest API & Custom Search

Who's it for This workflow is perfect for SEO specialists, marketers, bloggers, and content creators who want to automa...

Data Processing0 downloads

Bulk Automated Google Drive Files Sharing and Direct Download Link Generation

This N8N workflow automates the process of sharing files from Google Drive. It includes OAuth2 authentication, batch pro...

Orchestrate Web Crawls with Scrapyd and Automated Data Enrichment

Tags

Related Templates

USDT And TRC20 Wallet Tracker API Workflow for n8n

Automate Daily Keyword Research with Google Sheets, Suggest API & Custom Search

Bulk Automated Google Drive Files Sharing and Direct Download Link Generation

Workflow Visualization

Loading...

Comments (0)