Extract Structured Data from D&B Company Reports with GPT-4o
Pull a Dun & Bradstreet Business Information Report (PDF) by DUNS, convert the response into a binary PDF file, extract readable text, and use OpenAI to return a clean, flat JSON with only the key fields you care about (e.g., report date, Paydex, viability score, credit limit). Includes Sticky Notes for quick setup help and guidance.
β
What this template does
Requests a D&B report* (PDF) for a specific DUNS* via HTTP
Converts* the API response into a binary PDF file*
Extracts** the text from the PDF for analysis
Uses OpenAI with a Structured Output Parser to return a flat JSON
Designed to be extended to Sheets, databases, or CRMs
π§© How it works (node-by-node) Manual Trigger β Runs the workflow on demand ("When clicking 'Execute workflow'"). D&B Report (HTTP Request) β Calls the D&B Reports API for a Business Information Report (PDF). Convert to PDF File (Convert to File) β Turns the D&B response payload into a binary PDF. Extract Binary (Extract from File) β Extracts text content from the PDF. OpenAI Chat Model β Provides the language model context for the analyzer. Analyze PDF (AI Agent) β Reads the extracted text and applies strict rules for a flat JSON output. Structured Output (AI Structured Output Parser) β Enforces a schema and validates/auto-fixes the JSON shape. (Optional) Get Bearer Token (HTTP Request) β Template guidance for OAuth token retrieval (shown as disabled; included for reference if you prefer Bearer flows).
π οΈ Setup instructions (from the JSON)
- D&B Report (HTTP Request)
Auth:* Header Auth (use an n8n HTTP Header Auth* credential)
URL:**
https://plus.dnb.com/v1/reports/duns/804735132?productId=birstd&inLanguage=en-US&reportFormat=PDF&orderReason=6332&tradeUp=hq&customerReference=customer%20reference%20text
Headers:**
Accept: application/json
Credential Example:** D&B (HTTP Header Auth)
> Put your Authorization: Bearer <token> header inside this credential, not directly in the node.
-
Convert to PDF File (Convert to File) Operation:** toBinary
Source Property:** contents[0].contentObject
> This takes the PDF content from the D&B API response and converts it to a binary file for downstream nodes. -
Extract Binary (Extract from File) Operation:** pdf
> Produces a text field with the extracted PDF content, ready for AI analysis. -
OpenAI Model(s) OpenAI Chat Model**
Model:** gpt-4o (as configured in the JSON)
Credential:* Your stored OpenAI API credential (do not* hardcode keys) Wiring:**
Connect OpenAI Chat Model as ai_languageModel to Analyze PDF
Connect another OpenAI Chat Model (also gpt-4o) as ai_languageModel to Structured Output -
Analyze PDF (AI Agent) Prompt Type:** define
Text:**
={{ $json.text }} System Message (rules):**
You are a precision extractor. Read the provided business report PDF and return only a single flat JSON object with the fields below.
No arrays/lists.
No prose.
If a value is missing, output null.
Dates: YYYY-MM-DD.
Numbers: plain numerics (no commas or $).
Prefer most recent or highest-level overall values if multiple are shown.
Never include arrays, nested structures, or text outside of the JSON object. -
Structured Output (AI Structured Output Parser) JSON Schema Example:** { "report_date": "", "company_name": "", "duns": "", "dnb_rating_overall": "", "composite_credit_appraisal": "", "viability_score": "", "portfolio_comparison_score": "", "paydex_3mo": "", "paydex_24mo": "", "credit_limit_conservative": "" } Auto Fix:** enabled Wiring:* Connect as ai_outputParser to Analyze PDF*
-
(Optional) Get Bearer Token (HTTP Request) β Disabled example If you prefer fetching tokens dynamically: Auth:** Basic Auth (D&B username/password) Method:** POST URL:** https://plus.dnb.com/v3/token Body Parameters:** grant_type = client_credentials Headers:** Accept: application/json Downstream usage:** Set header Authorization: Bearer {{$json["access_token"]}} in subsequent calls.
> In this template, the D&B Report node uses Header Auth credential instead. Use one strategy consistently (credentials are recommended for security).
π§ Output schema (flat JSON) The analyzer + parser return a single flat object like:
{ "report_date": "2024-12-31", "company_name": "Example Corp", "duns": "123456789", "dnb_rating_overall": "5A2", "composite_credit_appraisal": "Fair", "viability_score": "3", "portfolio_comparison_score": "2", "paydex_3mo": "80", "paydex_24mo": "78", "credit_limit_conservative": "25000" }
π§ͺ Test flow Click Execute workflow (Manual Trigger). Confirm D&B Report returns the PDF response. Check Convert to PDF File for a binary file. Verify Extract from File produces a text field. Inspect Analyze PDF β Structured Output for valid JSON.
π Security notes Do not hardcode tokens in nodes; use Credentials (HTTP Header Auth or Basic Auth). Restrict who can execute the workflow if it's accessible from outside your network. Avoid storing sensitive payloads in logs; mask tokens/headers.
π§© Customize Map the structured JSON to Google Sheets, Postgres/BigQuery, or a CRM. Extend the schema with additional fields (e.g., number of employees, HQ address) β keep it flat. Add validation (Set/IF nodes) to ensure required fields exist before writing downstream.
π©Ή Troubleshooting Missing PDF text?* Ensure Convert to File* source property is contents[0].contentObject. Unauthorized from D&B?** Refresh/verify token; confirm Header Auth credential contains Authorization: Bearer <token>. Parser errors?** Keep the agent output short and flat; the Structured Output node will auto-fix minor issues. Different DUNS/product?** Update the D&B Report URL query params (duns, productId, etc.).
ποΈ Sticky Notes (included) Overview:** "Fetch D&B Company Report (PDF) β Convert β Extract β Summarize to Structured JSON (n8n)" Setup snippets for Data Blocks (optional) and Auth flow
π¬ Contact Need help customizing this (e.g., routing the PDF to Drive, mapping JSON to your CRM, or expanding the schema)?
π§ robert@ynteractive.com
π https://www.linkedin.com/in/robert-breen-29429625/
π https://ynteractive.com
Related Templates
Use OpenRouter in n8n versions <1.78
What it is: In version 1.78, n8n introduced a dedicated node to use the OpenRouter service, which lets you to use a lot...
Task Deadline Reminders with Google Sheets, ChatGPT, and Gmail
Intro This template is for project managers, team leads, or anyone who wants to automatically remind teammates of tasks ...
π€ Build Resilient AI Workflows with Automatic GPT and Gemini Failover Chain
This workflow contains community nodes that are only compatible with the self-hosted version of n8n. How it works This...
π Please log in to import templates to n8n and favorite templates
Workflow Visualization
Loading...
Preparing workflow renderer
Comments (0)
Login to post comments