Omnilingual MT: Machine Translation for 1,600 Languages
arXiv cs.CL / 3/18/2026
📰 NewsSignals & Early TrendsModels & Research
Key Points
- Omnilingual Machine Translation (OMT) is reported as the first MT system to support more than 1,600 languages, marking a major expansion in multilingual coverage.
- The scale is enabled by a data strategy that combines large public multilingual corpora with newly created datasets, including manually curated MeDLEY bitext.
- The paper explores two LLM specialization approaches — as a decoder-only model (OMT-LLaMA) and as a module in an encoder-decoder architecture (OMT-NLLB) — with 1B–8B parameter models matching or exceeding a 70B LLM MT baseline.
- English-to-1,600 translations show that while baselines can interpret undersupported languages, they often fail to generate them with fidelity, whereas OMT improves coherent generation and cross-lingual transfer.
- The leaderboard and evaluation datasets (BOUQuET and Met-BOUQuET) are evolving toward Omnilinguality and will be freely available.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.
Reddit r/LocalLLaMA

Waymo hits 170 million miles while avoiding serious mayhem
The Verge
The Inference Market Is Consolidating. Agent Payments Are Still Nobody's Problem.
Dev.to
QwenDean-4B | fine-tuned SLM for UIGen; our first attempt, looking for feedback!
Reddit r/LocalLLaMA

Signal’s Creator Is Helping Encrypt Meta AI
Wired