Evaluate AI Agent Response Relevance using OpenAI and Cosine Similarity
This n8n template demonstrates how to calculate the evaluation metric "Relevance" which in this scenario, measures the relevance of the agent's response to the user's question.
The scoring approach is adapted from the open-source evaluations project RAGAS and you can see the source here https://github.com/explodinggradients/ragas/blob/main/ragas/src/ragas/metrics/_answer_relevance.py
How it works This evaluation works best for Q&A agents. For our scoring, we analyse the agent's response and ask another AI to generate a question from it. This generated question is then compared to the original question using cosine similarity. A high score indicates relevance and the agent's successful ability to answer the question whereas a low score means agent may have added too much irrelevant info, went off script or hallucinated.
Requirements n8n version 1.94+ Check out this Google Sheet for a sample data https://docs.google.com/spreadsheets/d/1YOnu2JJjlxd787AuYcg-wKbkjyjyZFgASYVV0jsij5Y/edit?usp=sharing
Related Templates
Reply to Outlook Emails with OpenAI
Who is this template for? This template is for any Microsoft Outlook user who wants a trained AI agent to reason and rep...
Text automations using Apple Shortcuts
Overview This workflow answers user requests sent via Mac Shortcuts Several Shortcuts call the same webhook, with a quer...
AI SEO Readability Audit: Check Website Friendliness for LLMs
Who is this for? This workflow is designed for SEO specialists, content creators, marketers, and website developers who ...
🔒 Please log in to import templates to n8n and favorite templates
Workflow Visualization
Loading...
Preparing workflow renderer
Comments (0)
Login to post comments