Extract and Structure Thai Documents to Google Sheets using Typhoon OCR and Llama 3.1
⚠️ Note: This template requires a community node and works only on self-hosted n8n installations. It uses the Typhoon OCR Python package and custom command execution. Make sure to install required dependencies locally.
Who is this for?
This template is for developers, operations teams, and automation builders in Thailand (or any Thai-speaking environment) who regularly process PDFs or scanned documents in Thai and want to extract structured text into a Google Sheet.
It is ideal for: Local government document processing Thai-language enterprise paperwork AI automation pipelines requiring Thai OCR
What problem does this solve?
Typhoon OCR is one of the most accurate OCR tools for Thai text. However, integrating it into an end-to-end workflow usually requires manual scripting and data wrangling.
This template solves that by: Running Typhoon OCR on PDF files Using AI to extract structured data fields Automatically storing results in Google Sheets
What this workflow does
Trigger: Run manually or from any automation source Read Files: Load local PDF files from a doc/ folder Execute Command: Run Typhoon OCR on each file using a Python command LLM Extraction: Send the OCR markdown to an AI model (e.g., GPT-4 or OpenRouter) to extract fields Code Node: Parse the LLM output as JSON Google Sheets: Append structured data into a spreadsheet
Setup
- Install Requirements
Python 3.10+ typhoon-ocr: pip install typhoon-ocr Install Poppler and add to system PATH (needed for pdftoppm, pdfinfo)
- Create folders
Create a folder called doc in the same directory where n8n runs (or mount it via Docker)
- Google Sheet
Create a Google Sheet with the following column headers:
| book_id | date | subject | detail | signed_by | signed_by2 | contact | download_url | | -------- | ---- | ------- | ------ | ---------- | ----------- | ------- | ------------- |
You can use this example Google Sheet as a reference.
- API Key
Export your TYPHOON_OCR_API_KEY and OPENAI_API_KEY in your environment (or set inside the command string in Execute Command node).
How to customize this workflow
Replace the LLM provider in the Basic LLM Chain node (currently supports OpenRouter) Change output fields to match your data structure (adjust the prompt and Google Sheet headers) Add trigger nodes (e.g., Dropbox Upload, Webhook) to automate input
About Typhoon OCR
Typhoon is a multilingual LLM and toolkit optimized for Thai NLP. It includes typhoon-ocr, a Python OCR library designed for Thai-centric documents. It is open-source, highly accurate, and works well in automation pipelines. Perfect for government paperwork, PDF reports, and multilingual documents in Southeast Asia.
Related Templates
Bulk Automated Google Drive Files Sharing and Direct Download Link Generation
This N8N workflow automates the process of sharing files from Google Drive. It includes OAuth2 authentication, batch pro...
USDT And TRC20 Wallet Tracker API Workflow for n8n
Overview This n8n workflow is specifically designed to monitor USDT TRC20 transactions within a specified wallet. It u...
Add product ideas to Google Sheets via a Slack
Use Case This workflow is a slight variation of a workflow we're using at n8n. In most companies, employees have a lot o...
🔒 Please log in to import templates to n8n and favorite templates
Workflow Visualization
Loading...
Preparing workflow renderer
Comments (0)
Login to post comments