Evaluation metric example: Correctness (judged by AI)
AI evaluation in n8n
This is a template for n8n's evaluation feature.
Evaluation is a technique for getting confidence that your AI workflow performs reliably, by running a test dataset containing different inputs through the workflow.
By calculating a metric (score) for each input, you can see where the workflow is performing well and where it isn't.
How it works
This template shows how to calculate a workflow evaluation metric: whether an output matches an expected output (i.e. has the same meaning).
The workflow takes questions about the causes of historical events and compares them with the reference answers in the dataset.
We use an evaluation trigger to read in our dataset It is wired up in parallel with the regular chat trigger so that the workflow can be started from either one. More info If we're evaluating (i.e. the execution started from the evaluation trigger), we calculate the correctness metric using AI We pass this information back to n8n as a metric If we're not evaluating we avoid calculating the metric, to reduce cost
Related Templates
Get Airtable data via AI and Obsidian Notes
I am submitting this workflow for the Obsidian community to showcase the potential of integrating Obsidian with n8n. Whi...
Task Deadline Reminders with Google Sheets, ChatGPT, and Gmail
Intro This template is for project managers, team leads, or anyone who wants to automatically remind teammates of tasks ...
Use OpenRouter in n8n versions <1.78
What it is: In version 1.78, n8n introduced a dedicated node to use the OpenRouter service, which lets you to use a lot...
🔒 Please log in to import templates to n8n and favorite templates
Workflow Visualization
Loading...
Preparing workflow renderer
Comments (0)
Login to post comments