Extract Structured Data from D&B Company Reports with GPT-4o
Pull a Dun & Bradstreet Business Information Report (PDF) by DUNS, convert the response into a binary PDF file, extract readable text, and use OpenAI to return a clean, flat JSON with only the key fields you care about (e.g., report date, Paydex, viability score, credit limit). Includes Sticky Notes for quick setup help and guidance.
β
What this template does
Requests a D&B report* (PDF) for a specific DUNS* via HTTP
Converts* the API response into a binary PDF file*
Extracts** the text from the PDF for analysis
Uses OpenAI with a Structured Output Parser to return a flat JSON
Designed to be extended to Sheets, databases, or CRMs
π§© How it works (node-by-node) Manual Trigger β Runs the workflow on demand ("When clicking 'Execute workflow'"). D&B Report (HTTP Request) β Calls the D&B Reports API for a Business Information Report (PDF). Convert to PDF File (Convert to File) β Turns the D&B response payload into a binary PDF. Extract Binary (Extract from File) β Extracts text content from the PDF. OpenAI Chat Model β Provides the language model context for the analyzer. Analyze PDF (AI Agent) β Reads the extracted text and applies strict rules for a flat JSON output. Structured Output (AI Structured Output Parser) β Enforces a schema and validates/auto-fixes the JSON shape. (Optional) Get Bearer Token (HTTP Request) β Template guidance for OAuth token retrieval (shown as disabled; included for reference if you prefer Bearer flows).
π οΈ Setup instructions (from the JSON)
- D&B Report (HTTP Request)
Auth:* Header Auth (use an n8n HTTP Header Auth* credential)
URL:**
https://plus.dnb.com/v1/reports/duns/804735132?productId=birstd&inLanguage=en-US&reportFormat=PDF&orderReason=6332&tradeUp=hq&customerReference=customer%20reference%20text
Headers:**
Accept: application/json
Credential Example:** D&B (HTTP Header Auth)
> Put your Authorization: Bearer <token> header inside this credential, not directly in the node.
-
Convert to PDF File (Convert to File) Operation:** toBinary
Source Property:** contents[0].contentObject
> This takes the PDF content from the D&B API response and converts it to a binary file for downstream nodes. -
Extract Binary (Extract from File) Operation:** pdf
> Produces a text field with the extracted PDF content, ready for AI analysis. -
OpenAI Model(s) OpenAI Chat Model**
Model:** gpt-4o (as configured in the JSON)
Credential:* Your stored OpenAI API credential (do not* hardcode keys) Wiring:**
Connect OpenAI Chat Model as ai_languageModel to Analyze PDF
Connect another OpenAI Chat Model (also gpt-4o) as ai_languageModel to Structured Output -
Analyze PDF (AI Agent) Prompt Type:** define
Text:**
={{ $json.text }} System Message (rules):**
You are a precision extractor. Read the provided business report PDF and return only a single flat JSON object with the fields below.
No arrays/lists.
No prose.
If a value is missing, output null.
Dates: YYYY-MM-DD.
Numbers: plain numerics (no commas or $).
Prefer most recent or highest-level overall values if multiple are shown.
Never include arrays, nested structures, or text outside of the JSON object. -
Structured Output (AI Structured Output Parser) JSON Schema Example:** { "report_date": "", "company_name": "", "duns": "", "dnb_rating_overall": "", "composite_credit_appraisal": "", "viability_score": "", "portfolio_comparison_score": "", "paydex_3mo": "", "paydex_24mo": "", "credit_limit_conservative": "" } Auto Fix:** enabled Wiring:* Connect as ai_outputParser to Analyze PDF*
-
(Optional) Get Bearer Token (HTTP Request) β Disabled example If you prefer fetching tokens dynamically: Auth:** Basic Auth (D&B username/password) Method:** POST URL:** https://plus.dnb.com/v3/token Body Parameters:** grant_type = client_credentials Headers:** Accept: application/json Downstream usage:** Set header Authorization: Bearer {{$json["access_token"]}} in subsequent calls.
> In this template, the D&B Report node uses Header Auth credential instead. Use one strategy consistently (credentials are recommended for security).
π§ Output schema (flat JSON) The analyzer + parser return a single flat object like:
{ "report_date": "2024-12-31", "company_name": "Example Corp", "duns": "123456789", "dnb_rating_overall": "5A2", "composite_credit_appraisal": "Fair", "viability_score": "3", "portfolio_comparison_score": "2", "paydex_3mo": "80", "paydex_24mo": "78", "credit_limit_conservative": "25000" }
π§ͺ Test flow Click Execute workflow (Manual Trigger). Confirm D&B Report returns the PDF response. Check Convert to PDF File for a binary file. Verify Extract from File produces a text field. Inspect Analyze PDF β Structured Output for valid JSON.
π Security notes Do not hardcode tokens in nodes; use Credentials (HTTP Header Auth or Basic Auth). Restrict who can execute the workflow if it's accessible from outside your network. Avoid storing sensitive payloads in logs; mask tokens/headers.
π§© Customize Map the structured JSON to Google Sheets, Postgres/BigQuery, or a CRM. Extend the schema with additional fields (e.g., number of employees, HQ address) β keep it flat. Add validation (Set/IF nodes) to ensure required fields exist before writing downstream.
π©Ή Troubleshooting Missing PDF text?* Ensure Convert to File* source property is contents[0].contentObject. Unauthorized from D&B?** Refresh/verify token; confirm Header Auth credential contains Authorization: Bearer <token>. Parser errors?** Keep the agent output short and flat; the Structured Output node will auto-fix minor issues. Different DUNS/product?** Update the D&B Report URL query params (duns, productId, etc.).
ποΈ Sticky Notes (included) Overview:** "Fetch D&B Company Report (PDF) β Convert β Extract β Summarize to Structured JSON (n8n)" Setup snippets for Data Blocks (optional) and Auth flow
π¬ Contact Need help customizing this (e.g., routing the PDF to Drive, mapping JSON to your CRM, or expanding the schema)?
π§ robert@ynteractive.com
π https://www.linkedin.com/in/robert-breen-29429625/
π https://ynteractive.com
Related Templates
AI SEO Readability Audit: Check Website Friendliness for LLMs
Who is this for? This workflow is designed for SEO specialists, content creators, marketers, and website developers who ...
Task Deadline Reminders with Google Sheets, ChatGPT, and Gmail
Intro This template is for project managers, team leads, or anyone who wants to automatically remind teammates of tasks ...
π€ Build Resilient AI Workflows with Automatic GPT and Gemini Failover Chain
This workflow contains community nodes that are only compatible with the self-hosted version of n8n. How it works This...
π Please log in to import templates to n8n and favorite templates
Workflow Visualization
Loading...
Preparing workflow renderer
Comments (0)
Login to post comments