Developing an English-Efik Corpus and Machine Translation System for Digitization Inclusion
arXiv cs.CL / 3/17/2026
📰 NewsModels & Research
Key Points
- The study targets English-Efik translation for a low-resource language using a small parallel corpus of 13,865 sentence pairs.
- It compares fine-tuning of two multilingual MT models, mT5 and NLLB200, with NLLB-200 achieving BLEU scores of 26.64 (English→Efik) and 31.21 (Efik→English) and chrF scores of 51.04 and 47.92.
- The results demonstrate the feasibility of practical MT tools for low-resource languages and stress inclusive data practices and culturally grounded evaluation for equitable NLP.
- The work highlights digitization inclusion and provides a path for broader representation of underrepresented languages in NLP research.
Related Articles
[R] Combining Identity Anchors + Permission Hierarchies achieves 100% refusal in abliterated LLMs — system prompt only, no fine-tuning
Reddit r/MachineLearning
[P] Vibecoded on a home PC: building a ~2700 Elo browser-playable neural chess engine with a Karpathy-inspired AI-assisted research loop
Reddit r/MachineLearning
Meet DuckLLM 1.0 My First Model!
Reddit r/LocalLLaMA
Since FastFlowLM added support for Linux, I decided to benchmark all the models they support, here are some results
Reddit r/LocalLLaMA
What measure do I use to compare nested models and non nested models in high dimensional survival analysis [D]
Reddit r/MachineLearning