Route AI tasks between Anthropic Claude models with Postgres policies and SLA
Overview
This workflow implements a policy-driven LLM orchestration system that dynamically routes AI tasks to different language models based on task complexity, policies, and performance constraints.
Instead of sending every request to a single model, the workflow analyzes each task, applies policy rules, and selects the most appropriate model for execution. It also records telemetry data such as latency, token usage, and cost, enabling continuous optimization.
A built-in self-tuning mechanism runs weekly to analyze historical telemetry and automatically update routing policies. This allows the system to improve cost efficiency, performance, and reliability over time without manual intervention.
This architecture is useful for teams building AI APIs, agent platforms, or multi-model LLM systems where intelligent routing is needed to balance cost, speed, and quality.
How It Works
Webhook Task Input The workflow begins when a request is sent to the webhook endpoint. The request contains a task and optional priority metadata.
Task Classification A classifier agent analyzes the task and categorizes it into: extraction classification reasoning generation The agent also returns a confidence score.
Policy Engine Policy rules are loaded from a database. These rules define execution constraints such as: preferred model size latency limits token budgets retry strategies cost ceilings.
Model Routing A decision engine evaluates classification results and policy rules. Tasks are routed to either a small model (fast and cost-efficient) or a large model (higher reasoning capability).
Task Execution The selected LLM processes the task and generates the response.
Telemetry Collection Execution metrics are captured including: latency tokens used estimated cost model used success status. These metrics are stored in a database.
Weekly Self-Optimization A scheduled workflow analyzes telemetry from the past 7 days. If performance trends change, routing policies are automatically updated.
Setup Instructions
Configure a Postgres database Create two tables: policy_rules telemetry
Add LLM credentials Configure Anthropic credentials for the language model nodes.
Configure policy rules Define preferred models, cost limits, and latency thresholds in the policy_rules table.
Configure workflow settings Adjust parameters in the Workflow Configuration node: maximum latency cost ceiling token limits retry behavior.
Deploy the API endpoint Send requests to the webhook endpoint:
Use Cases
AI API Gateway Route requests to different models based on complexity and cost constraints.
Multi-Model AI Platforms Automatically choose the best model for each task without manual configuration.
Cost-Optimized AI Systems Prefer smaller models for simple tasks while reserving larger models for complex reasoning.
LLM Observability Track token usage, latency, and cost for each AI request.
Self-Optimizing AI Infrastructure Automatically improve routing policies using real execution telemetry.
Requirements
n8n with LangChain nodes enabled Postgres database Anthropic API credentials Tables: policy_rules telemetry
Optional:
Monitoring dashboards connected to telemetry data External policy management systems
Related Templates
Automate Daily Keyword Research with Google Sheets, Suggest API & Custom Search
Who's it for This workflow is perfect for SEO specialists, marketers, bloggers, and content creators who want to automa...
USDT And TRC20 Wallet Tracker API Workflow for n8n
Overview This n8n workflow is specifically designed to monitor USDT TRC20 transactions within a specified wallet. It u...
Add product ideas to Google Sheets via a Slack
Use Case This workflow is a slight variation of a workflow we're using at n8n. In most companies, employees have a lot o...
🔒 Please log in to import templates to n8n and favorite templates
Workflow Visualization
Loading...
Preparing workflow renderer
Comments (0)
Login to post comments