No One Fits All: From Fixed Prompting to Learned Routing in Multilingual LLMs

arXiv cs.CL / 4/21/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The study finds that translation-based prompting, often used in multilingual LLMs, does not perform best across all languages and tasks, with effectiveness varying by resource level.
  • For low-resource languages, translation prompting still provides strong gains even when translation quality is imperfect, while high-resource languages see little improvement.
  • Prompt-based self-routing is shown to underperform explicit translation, suggesting that learned selection beats routing approaches in this setting.
  • The authors reformulate prompting strategy selection as a learned decision problem and propose lightweight classifiers to choose between native and translation-based prompting, achieving statistically significant gains across four benchmarks and generalizing to unseen task formats.
  • Further analysis indicates that whether translation helps depends more on the language’s resource level than on translation quality alone.

Abstract

Translation-based prompting is widely used in multilingual LLMs, yet its effectiveness varies across languages and tasks. We evaluate prompting strategies across ten languages of different resource levels and four benchmarks. Our analysis shows that no single strategy is universally optimal. Translation strongly benefits low-resource languages even when translation quality is imperfect, high-resource languages gain little, and prompt-based self-routing underperforms explicit translation. Motivated by these findings, we formulate prompting strategy selection as a learned decision problem and introduce lightweight classifiers that predict whether native or translation-based prompting is optimal for each instance. The classifiers achieve statistically significant improvements over fixed strategies across four benchmarks and generalize to unseen task formats not observed during training. Further analysis reveals that language resource level, rather than translation quality alone, determines when translation is beneficial.