Omnilingual MT: Machine Translation for 1,600 Languages
arXiv cs.CL / 3/18/2026
📰 NewsSignals & Early TrendsModels & Research
Key Points
- Omnilingual Machine Translation (OMT) is reported as the first MT system to support more than 1,600 languages, marking a major expansion in multilingual coverage.
- The scale is enabled by a data strategy that combines large public multilingual corpora with newly created datasets, including manually curated MeDLEY bitext.
- The paper explores two LLM specialization approaches — as a decoder-only model (OMT-LLaMA) and as a module in an encoder-decoder architecture (OMT-NLLB) — with 1B–8B parameter models matching or exceeding a 70B LLM MT baseline.
- English-to-1,600 translations show that while baselines can interpret undersupported languages, they often fail to generate them with fidelity, whereas OMT improves coherent generation and cross-lingual transfer.
- The leaderboard and evaluation datasets (BOUQuET and Met-BOUQuET) are evolving toward Omnilinguality and will be freely available.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles
[R] Combining Identity Anchors + Permission Hierarchies achieves 100% refusal in abliterated LLMs — system prompt only, no fine-tuning
Reddit r/MachineLearning
The Demethylation
Dev.to
[P] Vibecoded on a home PC: building a ~2700 Elo browser-playable neural chess engine with a Karpathy-inspired AI-assisted research loop
Reddit r/MachineLearning
Meet DuckLLM 1.0 My First Model!
Reddit r/LocalLLaMA

95% of UK students now use AI and their experiences couldn't be more divided
THE DECODER