Developing an English-Efik Corpus and Machine Translation System for Digitization Inclusion
arXiv cs.CL / 3/17/2026
📰 NewsModels & Research
Key Points
- The study targets English-Efik translation for a low-resource language using a small parallel corpus of 13,865 sentence pairs.
- It compares fine-tuning of two multilingual MT models, mT5 and NLLB200, with NLLB-200 achieving BLEU scores of 26.64 (English→Efik) and 31.21 (Efik→English) and chrF scores of 51.04 and 47.92.
- The results demonstrate the feasibility of practical MT tools for low-resource languages and stress inclusive data practices and culturally grounded evaluation for equitable NLP.
- The work highlights digitization inclusion and provides a path for broader representation of underrepresented languages in NLP research.
Related Articles
Co-Activation Pattern Detection for Prompt Injection: A Mechanistic Interpretability Approach Using Sparse Autoencoders
Reddit r/LocalLLaMA

How to Train Custom Language Models: Fine-Tuning vs Training From Scratch (2026)
Dev.to

KoboldCpp 1.110 - 3 YR Anniversary Edition, native music gen, qwen3tts voice cloning and more
Reddit r/LocalLLaMA
Qwen3.5 Knowledge density and performance
Reddit r/LocalLLaMA
I think I made the best general use System Prompt for Qwen 3.5 (OpenWebUI + Web search)
Reddit r/LocalLLaMA