MTRouter: Cost-Aware Multi-Turn LLM Routing with History-Model Joint Embeddings

arXiv cs.CL / 4/28/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

The paper proposes MTRouter, a cost-aware routing method that chooses which LLM from a model pool to call at each turn under a fixed inference budget.
MTRouter represents both conversation history and candidate models using joint history–model embeddings, then learns a per-turn outcome (utility) estimator from logged trajectories.
Experiments on ScienceWorld show MTRouter beating GPT-5 while cutting total inference cost by 58.7%.
On Humanity’s Last Exam (HLE), it reaches competitive accuracy with a 43.4% total cost reduction versus GPT-5, and the improvements generalize to held-out tasks.
Analysis attributes the gains to fewer model switches, higher tolerance to transient errors, and emergent specialization behavior across models.

Abstract

Multi-turn, long-horizon tasks are increasingly common for large language models (LLMs), but solving them typically requires many sequential model invocations, accumulating substantial inference costs. Here, we study cost-aware multi-turn LLM routing: selecting which model to invoke at each turn from a model pool, given a fixed cost budget. We propose MTRouter, which encodes the interaction history and candidate models into joint history-model embeddings, and learns an outcome estimator from logged trajectories to predict turn-level model utility. Experiments show that MTRouter improves the performance-cost trade-off: on ScienceWorld, it surpasses GPT-5 while reducing total cost by 58.7%; on Humanity's Last Exam (HLE), it achieves competitive accuracy while reducing total cost by 43.4% relative to GPT-5, and these gains even carry over to held-out tasks. Further analyses reveal several mechanisms underlying its effectiveness: relative to prior multi-turn routers, MTRouter makes fewer model switches, is more tolerant to transient errors, and exhibits emergent specialization across models. Code: https://github.com/ZhangYiqun018/MTRouter

How to Build Traceable and Evaluated LLM Workflows Using Promptflow, Prompty, and OpenAI

MarkTechPost

An improvement of the convergence proof of the ADAM-Optimizer

Dev.to

Claude Code 会话历史在哪里？如何找回你的 AI 编程对话记录

Dev.to

We built an AI that runs an entire business autonomously. Not a demo. Not a prototype. Actually running. YC-backed, here's what we learned.

Reddit r/artificial

langchain-tests==1.1.7

LangChain Releases

MTRouter: Cost-Aware Multi-Turn LLM Routing with History-Model Joint Embeddings

Key Points

Abstract

Related Articles

How to Build Traceable and Evaluated LLM Workflows Using Promptflow, Prompty, and OpenAI

An improvement of the convergence proof of the ADAM-Optimizer

Claude Code 会话历史在哪里？如何找回你的 AI 编程对话记录

We built an AI that runs an entire business autonomously. Not a demo. Not a prototype. Actually running. YC-backed, here's what we learned.

langchain-tests==1.1.7

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer