Simple Eval for Legal Benchmarking

Name: Simple Eval for Legal Benchmarking
Availability: InStock
Rating: 0.4 (1 reviews)
Author: Adam Janes

This workflow demonstrates a simple way to run evals on a set of test cases stored in a Google Sheet.

The example we are using comes from an info extraction task dataset, where we tested 6 different LLMs on 18 different test cases.

You can see our sample data in this spreadsheet here to get started.

Once you have this working for our dataset, you can plug in your own test cases matching different LLMs to see how it works with your own data.

How it works: It loads test cases from Google Sheets. For each row in our Google Sheet, it grabs the source document, converting it to text. Our "LLM judge" passes the input/output of each LLM to GPT-4.1 to evaluate each test case (Pass/Fail + Reason). It logs the outcome to a Google Sheet. A 0.5s pause between each request gets around OpenAI's API rate limits.

Set up steps: Add your credentials for Google Sheets, Google Drive, and OpenRouter. Make a copy of the original data spreadsheet so that you can edit it yourself. You will need to plug your version in the Update Results node to see the spreadsheet update on each run of the loop.

0

Downloads

267

Views

8.94

Quality Score

intermediate

Complexity

Category:AI & Machine Learning

Author:Adam Janes(View Original →)

Created:8/13/2025

Updated:11/18/2025

Related Templates

Reply to Outlook Emails with OpenAI

Who is this template for? This template is for any Microsoft Outlook user who wants a trained AI agent to reason and rep...

AI & Machine Learning4 downloads

Text automations using Apple Shortcuts

Overview This workflow answers user requests sent via Mac Shortcuts Several Shortcuts call the same webhook, with a quer...

AI & Machine Learning1 downloads

AI SEO Readability Audit: Check Website Friendliness for LLMs

Who is this for? This workflow is designed for SEO specialists, content creators, marketers, and website developers who ...

Simple Eval for Legal Benchmarking

Tags

Related Templates

Reply to Outlook Emails with OpenAI

Text automations using Apple Shortcuts

AI SEO Readability Audit: Check Website Friendliness for LLMs

Workflow Visualization

Loading...

Comments (0)