AFRILANGTUTOR: Advancing Language Tutoring and Culture Education in Low-Resource Languages with Large Language Models
arXiv cs.CL / 4/24/2026
📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research
Key Points
- The paper addresses how to build language learning and tutoring systems for low-resource African languages that lack sufficient training data.
- It introduces AFRILANGDICT (194.7K dictionary entries) as seed material to automatically generate verifiable student–tutor question-answer interactions, and AFRILANGEDU (78.9K multi-turn examples) for training.
- Using AFRILANGEDU, the authors train AFRILANGTUTOR by fine-tuning two multilingual LLMs (Llama-3-8B-IT and Gemma-3-12B-IT) across 10 African languages.
- The fine-tuned models outperform their base versions, and combining supervised fine-tuning (SFT) with direct preference optimization (DPO) improves results by 1.8% to 15.5% under LLM-as-a-judge evaluation across multiple criteria.
- All datasets and resources are released publicly on Hugging Face to support further research and development for low-resource language education.


