No One Fits All: From Fixed Prompting to Learned Routing in Multilingual LLMs

arXiv cs.CL / 4/21/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The study finds that translation-based prompting, often used in multilingual LLMs, does not perform best across all languages and tasks, with effectiveness varying by resource level.
For low-resource languages, translation prompting still provides strong gains even when translation quality is imperfect, while high-resource languages see little improvement.
Prompt-based self-routing is shown to underperform explicit translation, suggesting that learned selection beats routing approaches in this setting.
The authors reformulate prompting strategy selection as a learned decision problem and propose lightweight classifiers to choose between native and translation-based prompting, achieving statistically significant gains across four benchmarks and generalizing to unseen task formats.
Further analysis indicates that whether translation helps depends more on the language’s resource level than on translation quality alone.

Abstract

Translation-based prompting is widely used in multilingual LLMs, yet its effectiveness varies across languages and tasks. We evaluate prompting strategies across ten languages of different resource levels and four benchmarks. Our analysis shows that no single strategy is universally optimal. Translation strongly benefits low-resource languages even when translation quality is imperfect, high-resource languages gain little, and prompt-based self-routing underperforms explicit translation. Motivated by these findings, we formulate prompting strategy selection as a learned decision problem and introduce lightweight classifiers that predict whether native or translation-based prompting is optimal for each instance. The classifiers achieve statistically significant improvements over fixed strategies across four benchmarks and generalize to unseen task formats not observed during training. Further analysis reveals that language resource level, rather than translation quality alone, determines when translation is beneficial.

Every time a new model comes out, the old one is obsolete of course

Reddit r/LocalLLaMA

We built it during the NVIDIA DGX Spark Full-Stack AI Hackathon — and it ended up winning 1st place overall 🏆

Dev.to

Stop Losing Progress: Setting Up a Pro Jupyter Workflow in VS Code (No More Colab Timeouts!)

Dev.to

Building AgentOS: Why I’m Building the AWS Lambda for Insurance Claims

Dev.to

Where we are. In a year, everything has changed. Kimi - Minimax - Qwen - Gemma - GLM

Reddit r/LocalLLaMA

No One Fits All: From Fixed Prompting to Learned Routing in Multilingual LLMs

Key Points

Abstract

Related Articles

Every time a new model comes out, the old one is obsolete of course

We built it during the NVIDIA DGX Spark Full-Stack AI Hackathon — and it ended up winning 1st place overall 🏆

Stop Losing Progress: Setting Up a Pro Jupyter Workflow in VS Code (No More Colab Timeouts!)

Building AgentOS: Why I’m Building the AWS Lambda for Insurance Claims

Where we are. In a year, everything has changed. Kimi - Minimax - Qwen - Gemma - GLM

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer