An Empirical Study of Many-Shot In-Context Learning for Machine Translation of Low-Resource Languages

arXiv cs.CL / 4/6/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper presents an empirical evaluation of many-shot in-context learning (ICL) for machine translation from English into ten newly included truly low-resource languages in FLORES+.
  • It finds that translation quality generally improves as the number of ICL examples increases, highlighting the benefit of longer-context prompting for low-resource settings.
  • The study shows that BM25-based retrieval of more informative examples substantially improves data efficiency, with 50 retrieved examples performing similarly to about 250 many-shot examples.
  • Using 250 retrieved examples yields results comparable to using roughly 1,000 many-shot examples, suggesting retrieval can reduce inference cost while maintaining effectiveness.
  • The authors also analyze how factors like example retrieval quality, out-of-domain data, and ordering by length affect many-shot ICL performance.

Abstract

In-context learning (ICL) allows large language models (LLMs) to adapt to new tasks from a few examples, making it promising for languages underrepresented in pre-training. Recent work on many-shot ICL suggests that modern LLMs can further benefit from larger ICL examples enabled by their long context windows. However, such gains depend on careful example selection, and the inference cost can be prohibitive for low-resource language communities. In this paper, we present an empirical study of many-shot ICL for machine translation from English into ten truly low-resource languages recently added to FLORES+. We analyze the effects of retrieving more informative examples, using out-of-domain data, and ordering examples by length. Our findings show that many-shot ICL becomes more effective as the number of examples increases. More importantly, we show that BM25-based retrieval substantially improves data efficiency: 50 retrieved examples roughly match 250 many-shot examples, while 250 retrieved examples perform similarly to 1,000 many-shot examples.