Job Post to Sales Lead Pipeline with Scrape.do, Apollo.io & OpenAI

Lead Sourcing by Job Posts For Outreach With Scrape.do API & Open AI & Google Sheets

Overview

This n8n workflow automates the complete lead generation process by scraping job postings from Indeed, enriching company data via Apollo.io, identifying decision-makers, and generating personalized LinkedIn outreach messages using OpenAI. It integrates with Scrape.do for reliable web scraping, Apollo.io for B2B data enrichment, OpenAI for AI-powered personalization, and Google Sheets for centralized data storage.

Perfect for: Sales teams, recruiters, business development professionals, and marketing agencies looking to automate their outbound prospecting pipeline.

Workflow Components

  1. ā° Schedule Trigger

| Property | Value | |----------|-------| | Type | Schedule Trigger | | Purpose | Automatically initiates workflow on a recurring schedule | | Frequency | Weekly (Every Monday) | | Time | 00:00 UTC |

Function: Ensures consistent, hands-off lead generation by running the pipeline automatically without manual intervention.

  1. šŸ” Scrape.do Indeed API

| Property | Value | |----------|-------| | Type | HTTP Request (GET) | | Purpose | Scrapes job listings from Indeed via Scrape.do proxy API | | Endpoint | https://api.scrape.do | | Output Format | Markdown |

Request Parameters:

| Parameter | Value | Description | |-----------|-------|-------------| | token | API Token | Scrape.do authentication | | url | Indeed Search URL | Target job search page | | super | true | Uses residential proxies | | geoCode | us | US-based content | | render | true | JavaScript rendering enabled | | device | mobile | Mobile viewport for cleaner HTML | | output | markdown | Lightweight text output |

Function: Fetches Indeed job listings with anti-bot bypass, returning clean markdown for easy parsing.

  1. šŸ“‹ Parse Indeed Jobs

| Property | Value | |----------|-------| | Type | Code Node (JavaScript) | | Purpose | Extracts structured job data from markdown | | Mode | Run once for all items |

Extracted Fields:

| Field | Description | Example | |-------|-------------|---------| | jobTitle | Position title | "Senior Data Engineer" | | jobUrl | Indeed job link | "https://indeed.com/viewjob?jk=abc123" | | jobId | Indeed job identifier | "abc123" | | companyName | Hiring company | "Acme Corporation" | | location | City, State | "San Francisco, CA" | | salary | Pay range | "$120,000 - $150,000" | | jobType | Employment type | "Full-time" | | source | Data source | "Indeed" | | dateFound | Scrape date | "2025-01-15" |

Function: Parses markdown using regex patterns, filters invalid entries, and deduplicates by company name.

  1. šŸ“Š Add New Company (Google Sheets)

| Property | Value | |----------|-------| | Type | Google Sheets Node | | Purpose | Stores parsed job postings for tracking | | Operation | Append rows | | Target Sheet | "Add New Company" |

Function: Creates a historical record of all discovered job postings and companies for pipeline tracking.

  1. šŸ¢ Apollo Organization Search

| Property | Value | |----------|-------| | Type | HTTP Request (POST) | | Purpose | Enriches company data via Apollo.io API | | Endpoint | https://api.apollo.io/v1/organizations/search | | Authentication | HTTP Header Auth (x-api-key) |

Request Body: { "q_organization_name": "Company Name", "page": 1, "per_page": 1 }

Response Fields:

| Field | Description | |-------|-------------| | id | Apollo organization ID | | name | Official company name | | website_url | Company website | | linkedin_url | LinkedIn company page | | industry | Business sector | | estimated_num_employees | Company size | | founded_year | Year established | | city, state, country | Location details | | short_description | Company overview |

Function: Retrieves comprehensive company intelligence including LinkedIn profiles, industry classification, and employee count.

  1. šŸ“¤ Extract Apollo Org Data

| Property | Value | |----------|-------| | Type | Code Node (JavaScript) | | Purpose | Parses Apollo response and merges with original data | | Mode | Run once for each item |

Function: Extracts relevant fields from Apollo API response and combines with job posting data for downstream processing.

  1. šŸ‘„ Apollo People Search

| Property | Value | |----------|-------| | Type | HTTP Request (POST) | | Purpose | Finds decision-makers at target companies | | Endpoint | https://api.apollo.io/v1/mixed_people/search | | Authentication | HTTP Header Auth (x-api-key) |

Request Body: { "organization_ids": ["apollo_org_id"], "person_titles": [ "CTO", "Chief Technology Officer", "VP Engineering", "Head of Engineering", "Engineering Manager", "Technical Director", "CEO", "Founder" ], "page": 1, "per_page": 3 }

Response Fields:

| Field | Description | |-------|-------------| | first_name | Contact first name | | last_name | Contact last name | | title | Job title | | email | Email address | | linkedin_url | LinkedIn profile URL | | phone_number | Direct phone |

Function: Identifies key stakeholders and decision-makers based on configurable title filters.

  1. šŸ“ Format Leads

| Property | Value | |----------|-------| | Type | Code Node (JavaScript) | | Purpose | Structures lead data for outreach | | Mode | Run once for all items |

Function: Combines person data with company context, creating comprehensive lead profiles ready for personalization.

  1. šŸ¤– Generate Personalized Message (OpenAI)

| Property | Value | |----------|-------| | Type | OpenAI Node | | Purpose | Creates custom LinkedIn connection messages | | Model | gpt-4o-mini | | Max Tokens | 150 | | Temperature | 0.7 |

System Prompt: You are a professional outreach specialist. Write personalized LinkedIn connection request messages. Keep messages under 300 characters. Be friendly, professional, and mention a specific reason for connecting based on their role and company.

User Prompt Variables:

| Variable | Source | |----------|--------| | Name | $json.fullName | | Title | $json.title | | Company | $json.companyName | | Industry | $json.industry | | Job Context | $json.jobTitle |

Function: Generates unique, contextual outreach messages that reference specific hiring activity and company details.

  1. šŸ”— Merge Lead + Message

| Property | Value | |----------|-------| | Type | Code Node (JavaScript) | | Purpose | Combines lead data with generated message | | Mode | Run once for each item |

Function: Merges OpenAI response with lead profile, creating the final enriched record.

  1. šŸ’¾ Save Leads to Sheet

| Property | Value | |----------|-------| | Type | Google Sheets Node | | Purpose | Stores final lead data with personalized messages | | Operation | Append rows | | Target Sheet | "Leads" |

Data Mapping:

| Column | Data | |--------|------| | First Name | Lead's first name | | Last Name | Lead's last name | | Title | Job title | | Company | Company name | | LinkedIn URL | Profile link | | Country | Location | | Industry | Business sector | | Date Added | Timestamp | | Source | "Indeed + Apollo" | | Personalized Message | AI-generated outreach text |

Function: Creates actionable lead database ready for outreach campaigns.

Workflow Flow

ā° Schedule Trigger │ ā–¼ šŸ” Scrape.do Indeed API ──► Fetches job listings with JS rendering │ ā–¼ šŸ“‹ Parse Indeed Jobs ──► Extracts company names, job details │ ā–¼ šŸ“Š Add New Company ──► Saves to Google Sheets (Companies) │ ā–¼ šŸ¢ Apollo Org Search ──► Enriches company data │ ā–¼ šŸ“¤ Extract Apollo Org Data ──► Parses API response │ ā–¼ šŸ‘„ Apollo People Search ──► Finds decision-makers │ ā–¼ šŸ“ Format Leads ──► Structures lead profiles │ ā–¼ šŸ¤– Generate Personalized Message ──► AI creates custom outreach │ ā–¼ šŸ”— Merge Lead + Message ──► Combines all data │ ā–¼ šŸ’¾ Save Leads to Sheet ──► Final storage (Leads)

Configuration Requirements

API Keys & Credentials

| Credential | Purpose | Where to Get | |------------|---------|--------------| | Scrape.do API Token | Web scraping with anti-bot bypass | scrape.do/dashboard | | Apollo.io API Key | B2B data enrichment | apollo.io/settings/integrations | | OpenAI API Key | AI message generation | platform.openai.com | | Google Sheets OAuth2 | Data storage | n8n Credentials Setup |

n8n Credential Setup

| Credential Type | Configuration | |-----------------|---------------| | HTTP Header Auth (Apollo) | Header: x-api-key, Value: Your Apollo API key | | OpenAI API | API Key: Your OpenAI API key | | Google Sheets OAuth2 | Complete OAuth flow with Google |

Key Features

šŸ” Intelligent Job Scraping

Anti-Bot Bypass:** Residential proxy rotation via Scrape.do JavaScript Rendering:** Full headless browser for dynamic content Mobile Optimization:** Cleaner HTML with mobile viewport Markdown Output:** Lightweight, easy-to-parse format

šŸ¢ B2B Data Enrichment

Company Intelligence:** Industry, size, location, LinkedIn Decision-Maker Discovery:** Title-based filtering Contact Information:** Email, phone, LinkedIn profiles Real-Time Data:** Fresh information from Apollo.io

šŸ¤– AI-Powered Personalization

Contextual Messages:** References specific hiring activity Character Limit:** Optimized for LinkedIn (300 chars) Variable Temperature:** Balanced creativity and consistency Role-Specific:** Tailored to recipient's title and company

šŸ“Š Automated Data Management

Dual Sheet Storage:** Companies + Leads separation Timestamp Tracking:** Historical records Deduplication:** Prevents duplicate entries Ready for Export:** CSV-compatible format

Use Cases

šŸŽÆ Sales Prospecting

Identify companies actively hiring in your target market Find decision-makers at companies investing in growth Generate personalized cold outreach at scale Track pipeline from discovery to contact

šŸ‘„ Recruiting & Talent Acquisition

Monitor competitor hiring patterns Identify companies building specific teams Connect with hiring managers directly Build talent pipeline relationships

šŸ“ˆ Market Intelligence

Track industry hiring trends Monitor competitor expansion signals Identify emerging market opportunities Benchmark salary ranges by role

šŸ¤ Partnership Development

Find companies investing in complementary areas Identify potential integration partners Connect with technical leadership Build strategic relationship pipeline

Technical Notes

| Specification | Value | |---------------|-------| | Processing Time | 2-5 minutes per run (depending on job count) | | Jobs per Run | ~25 unique companies | | API Calls per Run | 1 Scrape.do + 25 Apollo Org + 25 Apollo People + ~75 OpenAI | | Data Accuracy | 90%+ for company matching | | Success Rate | 99%+ with proper error handling |

Rate Limits to Consider

| Service | Free Tier Limit | Recommendation | |---------|-----------------|----------------| | Scrape.do | 1,000 credits/month | ~40 runs/month | | Apollo.io | 100 requests/day | Add Wait nodes if needed | | OpenAI | Based on usage | Monitor costs (~$0.01-0.05/run) | | Google Sheets | 300 requests/minute | No issues expected |

Setup Instructions

Step 1: Import Workflow

Copy the JSON workflow configuration In n8n: Workflows → Import from JSON Paste configuration and save

Step 2: Configure Scrape.do

Sign up at scrape.do Navigate to Dashboard → API Token Copy your token Token is embedded in URL query parameter (already configured)

To customize search: Change the url parameter in "Scrape.do Indeed API" node: q=data+engineer (search term) l=Remote (location) fromage=7 (last 7 days)

Step 3: Configure Apollo.io

Sign up at apollo.io Go to Settings → Integrations → API Keys Create new API key In n8n: Credentials → Add Credential → Header Auth Name: x-api-key Value: Your Apollo API key Select this credential in both Apollo HTTP nodes

Step 4: Configure OpenAI

Go to platform.openai.com Create new API key In n8n: Credentials → Add Credential → OpenAI Paste API key Select credential in "Generate Personalized Message" node

Step 5: Configure Google Sheets

Create new Google Spreadsheet Create two sheets: Sheet 1: "Add New Company" Columns: companyName | jobTitle | jobUrl | location | salary | source | postedDate Sheet 2: "Leads" Columns: First Name | Last Name | Title | Company | LinkedIn URL | Country | Industry | Date Added | Source | Personalized Message Copy Sheet ID from URL In n8n: Credentials → Add Credential → Google Sheets OAuth2 Update both Google Sheets nodes with your Sheet ID

Step 6: Test and Activate

Manual Test: Click "Execute Workflow" button Verify Each Node: Check outputs step by step Review Data: Confirm data appears in Google Sheets Activate: Toggle workflow to "Active"

Error Handling

Common Issues

| Issue | Cause | Solution | |-------|-------|----------| | "Invalid character: " | Empty/malformed company name | Check Parse Indeed Jobs output | | "Node does not have credentials" | Credential not linked | Open node → Select credential | | Empty Parse Results | Indeed HTML structure changed | Check Scrape.do raw output | | Apollo Rate Limit (429) | Too many requests | Add 5-10s Wait node between calls | | OpenAI Timeout | Too many tokens | Reduce batch size or max_tokens | | "Your request is invalid" | Malformed JSON body | Verify expression syntax in HTTP nodes |

Troubleshooting Steps

Verify Credentials: Test each credential individually Check Node Outputs: Use "Execute Node" for debugging Monitor API Usage: Check Apollo and OpenAI dashboards Review Logs: Check n8n execution history for details Test with Sample: Use known company name to verify Apollo

Recommended Error Handling Additions

For production use, consider adding:

IF node after Apollo Org Search to handle empty results Error Workflow trigger for notifications Wait nodes between API calls for rate limiting Retry logic for transient failures

Performance Specifications

| Metric | Value | |--------|-------| | Execution Time | 2-5 minutes per scheduled run | | Jobs Discovered | ~25 per Indeed page | | Leads Generated | 1-3 per company (based on title matches) | | Message Quality | Professional, contextual, <300 chars | | Data Freshness | Real-time from Indeed + Apollo | | Storage Format | Google Sheets (unlimited rows) |

API Reference

Scrape.do API

| Endpoint | Method | Purpose | |----------|--------|---------| | https://api.scrape.do | GET | Direct URL scraping |

Documentation: [scrape.do/documentation

Apollo.io API

| Endpoint | Method | Purpose | |----------|--------|---------| | /v1/organizations/search | POST | Company lookup | | /v1/mixed_people/search | POST | People search |

Documentation: apolloio.github.io/apollo-api-docs

OpenAI API

| Endpoint | Method | Purpose | |----------|--------|---------| | /v1/chat/completions | POST | Message generation |

Documentation: [platform.openai.com

0
Downloads
0
Views
8.48
Quality Score
beginner
Complexity
Created:12/27/2025
Updated:1/1/2026

šŸ”’ Please log in to import templates to n8n and favorite templates

Workflow Visualization

Loading...

Preparing workflow renderer

Comments (0)

Login to post comments