AFRILANGTUTOR: Advancing Language Tutoring and Culture Education in Low-Resource Languages with Large Language Models

arXiv cs.CL / 4/24/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

共有:

Key Points

The paper addresses how to build language learning and tutoring systems for low-resource African languages that lack sufficient training data.
It introduces AFRILANGDICT (194.7K dictionary entries) as seed material to automatically generate verifiable student–tutor question-answer interactions, and AFRILANGEDU (78.9K multi-turn examples) for training.
Using AFRILANGEDU, the authors train AFRILANGTUTOR by fine-tuning two multilingual LLMs (Llama-3-8B-IT and Gemma-3-12B-IT) across 10 African languages.
The fine-tuned models outperform their base versions, and combining supervised fine-tuning (SFT) with direct preference optimization (DPO) improves results by 1.8% to 15.5% under LLM-as-a-judge evaluation across multiple criteria.
All datasets and resources are released publicly on Hugging Face to support further research and development for low-resource language education.

Abstract

How can language learning systems be developed for languages that lack sufficient training resources? This challenge is increasingly faced by developers across the African continent who aim to build AI systems capable of understanding and responding in local languages. To address this gap, we introduce AFRILANGDICT, a collection of 194.7K African language-English dictionary entries designed as seed resources for generating language-learning materials, enabling us to automatically construct large-scale, diverse, and verifiable student-tutor question-answer interactions suitable for training AI-assisted language tutors. Using AFRILANGDICT, we build AFRILANGEDU, a dataset of 78.9K multi-turn training examples for Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO). Using AFRILANGEDU, we train language tutoring models collectively referred to as AFRILANGTUTOR. We fine-tune two multilingual LLMs: Llama-3-8B-IT and Gemma-3-12B-IT on AFRILANGEDU across 10 African languages and evaluate their performance. Our results show that models trained on AFRILANGEDU consistently outperform their base counterparts, and combining SFT and DPO yields substantial improvements, with gains ranging from 1.8% to 15.5% under LLM-as-a-judge evaluations across four criteria. To facilitate further research on low-resource languages -- all resources are available at https://huggingface.co/afrilang-edu.

Black Hat USA

AI Business

How to Stop Your AI Coding Assistant From Being Useless at Specialized Tasks

Dev.to

GPT-5.5 System Card

Dev.to

[NeurIPS 2026] Dumb Question about formating [D]

Reddit r/MachineLearning

Crafting Your AI Rulebook for Niche DTC Support

Dev.to

AFRILANGTUTOR: Advancing Language Tutoring and Culture Education in Low-Resource Languages with Large Language Models

Key Points

Abstract

Related Articles

Black Hat USA

How to Stop Your AI Coding Assistant From Being Useless at Specialized Tasks

GPT-5.5 System Card

[NeurIPS 2026] Dumb Question about formating [D]

Crafting Your AI Rulebook for Niche DTC Support

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer