NiuTrans.LMT: Toward Inclusive and Scalable Multilingual Machine Translation with LLMs

arXiv cs.CL / 4/27/2026

💬 OpinionTools & Practical UsageModels & Research

Key Points

  • The paper identifies a specific failure mode in multilingual supervised fine-tuning on multi-way parallel data, where reusing data around a pivot language can cause large quality drops in reverse translation directions (X→pivot).
  • It explains this “Directional Degeneration” as being driven by excessive many-to-one mappings that encourage shortcut learning, and proposes Strategic Downsampling (SD) to mitigate the issue.
  • The authors introduce Parallel Multilingual Prompting (PMP), which adds auxiliary parallel sentences to translation instructions to improve cross-lingual transfer during training and can optionally enhance results at test time.
  • They release NiuTrans.LMT, a Chinese–English-centric multilingual translation model suite (four model sizes from 0.6B to 8B) covering 60 languages and 234 translation directions, with evaluations showing strong competitiveness and that the 4B model matches or exceeds much larger baselines.
  • The work is positioned as enabling more inclusive and scalable multilingual machine translation by providing both model releases and project resources for the community.

Abstract

Large language models have significantly advanced Multilingual Machine Translation (MMT), yet scaling to many languages while keeping quality robust across directions remains challenging. In this paper, we identify a failure mode of multilingual supervised fine-tuning (SFT) on multi-way parallel data: when such data are reused symmetrically around a pivot language (e.g., English), performance on reverse directions (X \to pivot) can drop substantially. We term this phenomenon Directional Degeneration and attribute it to excessive many-to-one mappings, which encourage shortcut learning. We propose Strategic Downsampling (SD), a simple yet effective method to mitigate this degeneration. In addition, we introduce Parallel Multilingual Prompting (PMP), which augments translation instructions with an auxiliary parallel sentence to promote cross-lingual transfer during training and enables optional test-time enhancement when auxiliary translations are available. We further develop \textbf{NiuTrans.LMT} (\textbf{L}arge-scale \textbf{M}ultilingual \textbf{T}ranslation, abbreviated as \textbf{LMT}), a Chinese-English-centric suite of multilingual translation models spanning four sizes (0.6B/1.7B/4B/8B) and covering 60 languages and 234 directions. Comprehensive evaluations show that LMT is competitive among open-source MMT systems, and that our 4B LMT model performs on par with or better than substantially larger baselines. We release our models and project resources to support inclusive and scalable MMT.