Extract and Structure Thai Documents to Google Sheets using Typhoon OCR and Llama 3.1

⚠️ Note: This template requires a community node and works only on self-hosted n8n installations. It uses the Typhoon OCR Python package and custom command execution. Make sure to install required dependencies locally.

Who is this for?

This template is for developers, operations teams, and automation builders in Thailand (or any Thai-speaking environment) who regularly process PDFs or scanned documents in Thai and want to extract structured text into a Google Sheet.

It is ideal for: Local government document processing Thai-language enterprise paperwork AI automation pipelines requiring Thai OCR

What problem does this solve?

Typhoon OCR is one of the most accurate OCR tools for Thai text. However, integrating it into an end-to-end workflow usually requires manual scripting and data wrangling.

This template solves that by: Running Typhoon OCR on PDF files Using AI to extract structured data fields Automatically storing results in Google Sheets

What this workflow does

Trigger: Run manually or from any automation source Read Files: Load local PDF files from a doc/ folder Execute Command: Run Typhoon OCR on each file using a Python command LLM Extraction: Send the OCR markdown to an AI model (e.g., GPT-4 or OpenRouter) to extract fields Code Node: Parse the LLM output as JSON Google Sheets: Append structured data into a spreadsheet

Setup

  1. Install Requirements

Python 3.10+ typhoon-ocr: pip install typhoon-ocr Install Poppler and add to system PATH (needed for pdftoppm, pdfinfo)

  1. Create folders

Create a folder called doc in the same directory where n8n runs (or mount it via Docker)

  1. Google Sheet

Create a Google Sheet with the following column headers:

| book_id | date | subject | detail | signed_by | signed_by2 | contact | download_url | | -------- | ---- | ------- | ------ | ---------- | ----------- | ------- | ------------- |

You can use this example Google Sheet as a reference.

  1. API Key

Export your TYPHOON_OCR_API_KEY and OPENAI_API_KEY in your environment (or set inside the command string in Execute Command node).

How to customize this workflow

Replace the LLM provider in the Basic LLM Chain node (currently supports OpenRouter) Change output fields to match your data structure (adjust the prompt and Google Sheet headers) Add trigger nodes (e.g., Dropbox Upload, Webhook) to automate input

About Typhoon OCR

Typhoon is a multilingual LLM and toolkit optimized for Thai NLP. It includes typhoon-ocr, a Python OCR library designed for Thai-centric documents. It is open-source, highly accurate, and works well in automation pipelines. Perfect for government paperwork, PDF reports, and multilingual documents in Southeast Asia.

0
Downloads
1271
Views
9.34
Quality Score
intermediate
Complexity
Author:Jaruphat J.(View Original →)
Created:8/13/2025
Updated:8/25/2025

🔒 Please log in to import templates to n8n and favorite templates

Workflow Visualization

Loading...

Preparing workflow renderer

Comments (0)

Login to post comments