Convert PDF, DOC, and Images to Markdown using Datalab.to API

This n8n workflow converts various file formats (.pdf, .doc, .png, .jpg, .webp) to clean markdown text using the datalab.to API. Perfect for AI agents, LLM processing, and RAG (Retrieval Augmented Generation) data preparation for vector databases.

Workflow Description

Input Trigger Node**: Form trigger or webhook to accept file uploads Supported Formats**: PDF documents, Word documents (.doc/.docx), and images (PNG, JPG, WEBP)

Processing Steps File Validation: Check file type and size constraints HTTP Request Node: Method: POST to https://api.datalab.to/v1/marker Headers: X-API-Key with your datalab.to API key Body: Multipart form data with the file Response Processing: Extract the converted markdown text Output Formatting: Clean and structure the markdown for downstream use

Output Clean, structured markdown text ready for: LLM prompt injection Vector database ingestion AI agent knowledge base processing Document analysis workflows

Setup Instructions Get API Access: Sign up at datalab.to to obtain your API key Configure Credentials: Create a new credential in n8n Add Generic Header: X-API-Key with your API key as the value Import Workflow: Ready to process files immediately

Use Cases AI Workflows**: Convert documents for LLM analysis and processing RAG Systems**: Prepare clean text for vector database ingestion Content Management**: Batch convert files to searchable markdown format Document Processing**: Extract text from mixed file types in automated pipelines

The workflow handles the complexity of different file formats while delivering consistent, AI-ready markdown output for your automation needs.

0
Downloads
0
Views
6.91
Quality Score
beginner
Complexity
Author:Joseph(View Original →)
Created:9/10/2025
Updated:9/24/2025

🔒 Please log in to import templates to n8n and favorite templates

Workflow Visualization

Loading...

Preparing workflow renderer

Comments (0)

Login to post comments