๐ Learn Evaluate tool. Tutorial for Beginners with Gemini and Google Sheets
This workflow is a beginner-friendly tutorial demonstrating how to use the Evaluation tool to automatically score the AIโs output against a known correct answer (โground truthโ) stored in a Google Sheet.
Advantages
โ Beginner-friendly โ Provides a simple and clear structure to understand AI evaluation. โ Flexible input sources โ Works with both Google Sheets datasets and manual test entries. โ Integrated with Google Gemini โ Leverages a powerful AI model for text-based tasks. โ Tool usage โ Demonstrates how an AI agent can call external tools (e.g., calculator) for accurate answers. โ Automated evaluation โ Outputs are automatically compared against ground truth data for factual correctness. โ Scalable testing โ Can handle multiple dataset rows, making it useful for structured AI model evaluation. โ Result tracking โ Saves both answers and correctness scores back to Google Sheets for easy monitoring.
How it Works
The workflow operates in two distinct modes, determined by the trigger: Manual Test Mode: Triggered by "When clicking 'Execute workflow'". It sends a fixed question ("How much is 8 * 3?") to the AI agent and returns the answer to the user. This mode is for quick, ad-hoc testing. Evaluation Mode: Triggered by "When fetching a dataset row". This mode reads rows of data from a linked Google Sheet. Each row contains an input (a question) and an expected_output (the correct answer). It processes each row as follows: The input question is sent to the AI Agent node. The AI Agent, powered by a Google Gemini model and equipped with a Calculator tool, processes the question and generates an answer (output). The workflow then checks if it's in evaluation mode. Instead of just returning the answer, it passes the AI's actual_output and the sheet's expected_output to another Evaluation node. This node uses a second Google Gemini model as a "judge" to evaluate the factual correctness of the AI's answer compared to the expected one, generating a Correctness score on a scale from 1 to 5. Finally, both the AI's actual_output and the automated correctness score are written back to a new column in the same row of the Google Sheet.
Set up Steps
To use this workflow, you need to complete the following setup steps:
Credentials Configuration: Set up the Google Sheets OAuth2 API credentials (named "Google Sheets account"). This allows n8n to read from and write to your Google Sheet. Set up the Google Gemini (PaLM) API credentials (named "Google Gemini(PaLM) (Eure)"). This provides the AI language model capabilities for both the agent and the evaluator.
Prepare Your Google Sheet: The workflow is pre-configured to use a specific Google Sheet. You must clone the provided template sheet (the URL is in the Sticky Note) to your own Google Drive. In your cloned sheet, ensure you have at least two columns: one for the input/question (e.g., input) and one for the expected correct answer (e.g., expected_output). You may need to update the node parameters that reference $json.input and $json.expected_output to match your column names exactly.
Update Document IDs: After cloning the sheet, get its new Document ID from its URL and update the documentId field in all three Evaluation nodes ("When fetching a dataset row", "Set output Evaluation", and "Set correctness") to point to your new sheet instead of the original template.
Activate the Workflow:
Once the credentials and sheet are configured, toggle the workflow to Active. You can then trigger a manual test run or set the "When fetching a dataset row" node to poll your sheet automatically to evaluate all rows.
Need help customizing?
Contact me for consulting and support or add me on Linkedin.
Related Templates
Track Demo Bookings with Google Calendar to Meta Conversions API Integration
Who is this workflow for? If you're using Meta Ads to generate new leads to your sales pipeline, this workflow is for yo...
Build a PDF-Based RAG System with OpenAI, Pinecone and Cohere Reranking
This workflow contains community nodes that are only compatible with the self-hosted version of n8n. This workflow prov...
Reusable and Independently Testable Sub-workflow
Reusable and Independently Testable Sub-workflow This n8n workflow provides a standardized structure for building and te...
๐ Please log in to import templates to n8n and favorite templates
Workflow Visualization
Loading...
Preparing workflow renderer
Comments (0)
Login to post comments