Evaluation metric example: Correctness (judged by AI)

Name: Evaluation metric example: Correctness (judged by AI)
Availability: InStock
Rating: 0.4 (1 reviews)
Author: David Roberts

AI evaluation in n8n

This is a template for n8n's evaluation feature.

Evaluation is a technique for getting confidence that your AI workflow performs reliably, by running a test dataset containing different inputs through the workflow.

By calculating a metric (score) for each input, you can see where the workflow is performing well and where it isn't.

How it works

This template shows how to calculate a workflow evaluation metric: whether an output matches an expected output (i.e. has the same meaning).

The workflow takes questions about the causes of historical events and compares them with the reference answers in the dataset.

We use an evaluation trigger to read in our dataset It is wired up in parallel with the regular chat trigger so that the workflow can be started from either one. More info If we're evaluating (i.e. the execution started from the evaluation trigger), we calculate the correctness metric using AI We pass this information back to n8n as a metric If we're not evaluating we avoid calculating the metric, to reduce cost

0

Downloads

654

Views

8.94

Quality Score

intermediate

Complexity

Category:AI & Machine Learning

Author:David Roberts(View Original →)

Created:8/13/2025

Updated:1/5/2026

Related Templates

AI SEO Readability Audit: Check Website Friendliness for LLMs

Who is this for? This workflow is designed for SEO specialists, content creators, marketers, and website developers who ...

AI & Machine Learning3 downloads

Task Deadline Reminders with Google Sheets, ChatGPT, and Gmail

Intro This template is for project managers, team leads, or anyone who wants to automatically remind teammates of tasks ...

AI & Machine Learning1 downloads

🤖 Build Resilient AI Workflows with Automatic GPT and Gemini Failover Chain

This workflow contains community nodes that are only compatible with the self-hosted version of n8n. How it works This...

Evaluation metric example: Correctness (judged by AI)

Tags

Related Templates

AI SEO Readability Audit: Check Website Friendliness for LLMs

Task Deadline Reminders with Google Sheets, ChatGPT, and Gmail

🤖 Build Resilient AI Workflows with Automatic GPT and Gemini Failover Chain

Workflow Visualization

Loading...

Comments (0)