Towards High-Quality Machine Translation for Kokborok: A Low-Resource Tibeto-Burman Language of Northeast India
arXiv cs.CL / 4/23/2026
📰 NewsSignals & Early TrendsIndustry & Market MovesModels & Research
Key Points
- The study introduces KokborokMT, a neural machine translation system for Kokborok, a low-resource Tibeto-Burman language spoken in Tripura, India.
- The authors fine-tune NLLB-200-distilled-600M using a multi-source parallel dataset totaling 36,052 sentence pairs, combining professional translations, Bible-domain data, and synthetic back-translations generated with Gemini Flash.
- They add a dedicated Kokborok language token to the NLLB framework to better support the language in the model.
- Evaluation shows the best model reaches BLEU scores of 17.30 and 38.56 on held-out test sets, with human assessments indicating solid adequacy (3.74/5) and fluency (3.70/5).
- The reported gains substantially outperform earlier MT attempts that were trained on small Bible-derived corpora and achieved BLEU under 7.
Related Articles

Black Hat USA
AI Business

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to

Trajectory Forecasts in Unknown Environments Conditioned on Grid-Based Plans
Dev.to

Elevating Austria: Google invests in its first data center in the Alps.
Google Blog

10 AI Tools Every Developer Should Try in 2026
Dev.to