RouteLMT: Learned Sample Routing for Hybrid LLM Translation Deployment

arXiv cs.CL / 4/27/2026

📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper addresses the high cost of deploying LLMs for machine translation by using a hybrid setup that routes only a fraction of requests to a large model while the rest go to a small model.
It reframes routing as a budget allocation problem and defines the key decision signal as the “marginal gain,” meaning the improvement the large model provides over the small model.
RouteLMT is introduced as an efficient in-model router that predicts this expected gain by probing the small translator’s prompt-token representation, avoiding reliance on external predictors or hypothesis decoding.
Experiments show RouteLMT beats heuristic and quality/difficulty estimation baselines, producing a better quality–budget Pareto frontier.
The authors also study regression risks and propose a guarded variant to reduce the chance of severe quality drops.

Abstract

Large Language Models (LLMs) have achieved remarkable performance in Machine Translation (MT), but deploying them at scale remains prohibitively expensive. A widely adopted remedy is the hybrid system paradigm, which balances cost and quality by serving most requests with a small model and selectively routing a fraction to a large model. However, existing routing strategies often rely on heuristics, external predictors, or absolute quality estimation, which fail to capture whether the large model actually provides a worthwhile improvement over the small one. In this paper, we formulate routing as a budget allocation problem and identify marginal gain, i.e., the large model's improvement over the small model, as the optimal signal for budgeted decisions. Building on this, we propose \textbf{RouteLMT} (routing for LLM-based MT), an efficient in-model router that predicts this expected gain by probing the small translators prompt-token representation, without requiring external models or hypothesis decoding. Extensive experiments demonstrate that our RouteLMT outperforms heuristics, quality/difficulty estimation baselines, achieving a superior quality-budget Pareto frontier. Furthermore, we analyze regression risks and show that a simple guarded variant can mitigate severe quality losses.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 4/27DailyView insight →

LLMs will be a commodity

Reddit r/artificial

Indian Developers: How to Build AI Side Income with $0 Capital in 2026

Dev.to

What it feels like to have to have Qwen 3.6 or Gemma 4 running locally

Reddit r/LocalLLaMA

From Fault Codes to Smart Fixes: How Google Cloud NEXT ’26 Inspired My AI Mechanic Assistant

Dev.to

Dex lands $5.3M to grow its AI-driven talent matching platform

Tech.eu

RouteLMT: Learned Sample Routing for Hybrid LLM Translation Deployment

Key Points

Abstract

💡 Insights using this article

Related Articles

LLMs will be a commodity

Indian Developers: How to Build AI Side Income with $0 Capital in 2026

What it feels like to have to have Qwen 3.6 or Gemma 4 running locally

From Fault Codes to Smart Fixes: How Google Cloud NEXT ’26 Inspired My AI Mechanic Assistant

Dex lands $5.3M to grow its AI-driven talent matching platform

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer