Generate consensus-based answers using Claude, GPT, Grok and Gemini

The original LLM Council concept was introduced by Andrej Karpathy and published as an open-source repository demonstrating multi-model consensus and ranking. This workflow is my adaptation of that original idea, reimplemented and structured as a production-ready n8n template. Original repository - https://github.com/karpathy/llm-council

This n8n template implements the LLM Council pattern: a single user question is processed in parallel by multiple large language models, independently evaluated by peer models, and then synthesized into one high-quality, consensus-driven final answer. It is designed for use cases where answer quality, balance, and reduced single-model bias are critical.

๐Ÿ“Œ Section 1: Trigger & Input

โšก When Chat Message Received (Chat Trigger) Purpose: Receives a userโ€™s message and initiates the entire workflow.

How it works:

A user sends a chat message

The message is stored as the Original Question

The same input is forwarded simultaneously to multiple LLM pipelines

Why it matters: Provides a clean, unified entry point for all downstream multi-model logic.

๐Ÿ“Œ Section 2: Stage 1 โ€” Parallel LLM Responses

๐Ÿค– Basic LLM Chains (x4) Models used:

Anthropic Claude

OpenAI GPT

xAI Grok

Google Gemini

Purpose: Each model independently generates its own response to the same question.

Key characteristics:

Identical prompt structure for all models

Independent reasoning paths

No shared context between models

Why it matters: Produces diverse perspectives, reasoning styles, and solution approaches.

๐Ÿ“Œ Section 3: Stage 2 โ€” Response Anonymization

๐Ÿงพ Set Nodes (Response A / B / C / D) Purpose: Stores model outputs in an anonymized format:

Response A

Response B

Response C

Response D

Why it matters: Prevents evaluator models from knowing which LLM authored which response, reducing bias during evaluation.

๐Ÿ“Œ Section 4: Stage 3 โ€” Peer Evaluation & Ranking

๐Ÿ“Š Evaluation Chains (Claude / GPT / Grok / Gemini) Purpose: Each model acts as a reviewer and:

Analyzes all four anonymized responses

Describes strengths and weaknesses of each

Produces a strict FINAL RANKING from best to worst

Ranking format (strict):

FINAL RANKING: Response B Response A Response D Response C

Why it matters: Creates multiple independent quality assessments from different model perspectives.

๐Ÿ“Œ Section 5: Stage 4 โ€” Ranking Aggregation

๐Ÿงฎ Code Node (JavaScript) Purpose: Aggregates all peer rankings by:

Parsing ranking positions

Calculating average position per response

Counting evaluation occurrences

Sorting responses by best average score

Output includes:

Aggregated rankings

Best response label

Best average score

Why it matters: Transforms subjective rankings into a structured, quantitative consensus.

๐Ÿ“Œ Section 6: Stage 5 โ€” Final Consensus Answer

๐Ÿง  Chairman LLM Chain Purpose: One model acts as the Council Chairman and:

Reviews all original responses

Considers peer rankings and aggregated scores

Identifies consensus patterns and disagreements

Produces a single, clear, high-quality final answer

Why it matters: Delivers a refined response that reflects collective model intelligence rather than a simple average.

๐Ÿ“Š Workflow Overview Stage Node / Logic Purpose 1 Chat Trigger Receive user question 2 LLM Chains Generate independent responses 3 Set Nodes Anonymize outputs 4 Evaluation Chains Peer review & ranking 5 Code Node Aggregate rankings 6 Chairman LLM Final synthesized answer ๐ŸŽฏ Key Benefits

๐Ÿง  Multi-model intelligence โ€” avoids reliance on a single LLM โš–๏ธ Reduced bias โ€” anonymized peer evaluation ๐Ÿ“Š Quality-driven selection โ€” ranking-based consensus ๐Ÿ” Modular architecture โ€” easy to add or replace models ๐ŸŒ Language-flexible โ€” input and output languages configurable ๐Ÿงฉ Production-ready logic โ€” clear stages, deterministic ranking

๐Ÿš€ Ideal Use Cases

High-stakes decision support

Complex technical or architectural questions

Strategy and research synthesis

AI assistants requiring higher trust and reliability

Comparing and selecting the best LLM-generated answers

0
Downloads
19
Views
8.43
Quality Score
intermediate
Complexity
Author:Yehor EGMS(View Original โ†’)
Created:2/13/2026
Updated:3/11/2026

๐Ÿ”’ Please log in to import templates to n8n and favorite templates

Workflow Visualization

Loading...

Preparing workflow renderer

Comments (0)

Login to post comments