MTRouter: Cost-Aware Multi-Turn LLM Routing with History-Model Joint Embeddings
arXiv cs.CL / 4/28/2026
📰 NewsDeveloper Stack & InfrastructureModels & Research
Key Points
- The paper proposes MTRouter, a cost-aware routing method that chooses which LLM from a model pool to call at each turn under a fixed inference budget.
- MTRouter represents both conversation history and candidate models using joint history–model embeddings, then learns a per-turn outcome (utility) estimator from logged trajectories.
- Experiments on ScienceWorld show MTRouter beating GPT-5 while cutting total inference cost by 58.7%.
- On Humanity’s Last Exam (HLE), it reaches competitive accuracy with a 43.4% total cost reduction versus GPT-5, and the improvements generalize to held-out tasks.
- Analysis attributes the gains to fewer model switches, higher tolerance to transient errors, and emergent specialization behavior across models.
Related Articles
How to Build Traceable and Evaluated LLM Workflows Using Promptflow, Prompty, and OpenAI
MarkTechPost

An improvement of the convergence proof of the ADAM-Optimizer
Dev.to
Claude Code 会话历史在哪里?如何找回你的 AI 编程对话记录
Dev.to
We built an AI that runs an entire business autonomously. Not a demo. Not a prototype. Actually running. YC-backed, here's what we learned.
Reddit r/artificial
langchain-tests==1.1.7
LangChain Releases