English is Not All You Need: Systematically Exploring the Role of Multilinguality in LLM Post-Training

arXiv cs.AI / 4/16/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The study argues that most LLM post-training pipelines are still English-centric, which can lead to uneven performance across languages despite widespread multilingual deployment.
  • Using 220 controlled supervised fine-tuning runs (up to 8B parameters) across translated multilingual mixtures for math reasoning and API-calling tasks, the paper finds that expanding language coverage during post-training is generally beneficial.
  • Low-resource languages gain the most from added coverage, while high-resource languages tend to plateau rather than degrade as more languages are included.
  • Even minimal multilinguality (adding a single non-English language) can improve English performance and cross-lingual generalization, making English-only post-training largely suboptimal.
  • At sufficiently high language diversity, zero-shot cross-lingual transfer can match or exceed the benefits of directly including languages in low-diversity settings, though typologically distant low-resource languages still see limited improvements.

Abstract

Despite the widespread multilingual deployment of large language models, post-training pipelines remain predominantly English-centric, contributing to performance disparities across languages. We present a systematic, controlled study of the interplay between training language coverage, model scale, and task domain, based on 220 supervised fine-tuning runs on parallel translated multilingual data mixtures spanning mathematical reasoning and API calling tasks, with models up to 8B parameters. We find that increasing language coverage during post-training is largely beneficial across tasks and model scales, with low-resource languages benefiting the most and high-resource languages plateauing rather than degrading. Even minimal multilinguality helps: incorporating a single non-English language improves both English performance and cross-lingual generalization, making English-only post-training largely suboptimal. Moreover, at sufficient language diversity, zero-shot cross-lingual transfer can match or exceed the effects of direct language inclusion in a low-diversity setting, although gains remain limited for typologically distant, low-resource languages.