Zero-Shot Morphological Discovery in Low-Resource Bantu Languages via Cross-Lingual Transfer and Unsupervised Clustering
arXiv cs.LG / 4/27/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper proposes a zero-shot approach to discover morphological features in low-resource Bantu languages by combining cross-lingual transfer learning with unsupervised clustering.
- Using Giriama (nyf) as a test case with only 91 labeled paradigms, the method assigns noun classes for 2,455 words and uncovers two previously undocumented morphological patterns with high consistency.
- External validation on 444 known Giriama verb paradigms yields 78.2% lemmatization accuracy, and a larger v3 corpus expansion (19,624 words) improves performance to 97.3% segmentation and 86.7% lemmatization across major word classes.
- The authors argue that a weighted-voting ensemble works best because transfer learning captures cognates via substantial vocabulary overlap (~60%), while clustering identifies language-specific innovations that transfer may miss.
- All code and the discovered lexicons are released to support morphological documentation efforts for other low-resource Bantu languages.
Related Articles

Subagents: The Building Block of Agentic AI
Dev.to

DeepSeek-V4 Models Could Change Global AI Race
AI Business

Got OpenAI's privacy filter model running on-device via ExecuTorch
Reddit r/LocalLLaMA

The Agent-Skill Illusion: Why Prompt-Based Control Fails in Multi-Agent Business Consulting Systems
Dev.to
We Built a Voice AI Receptionist in 8 Weeks — Every Decision We Made and Why
Dev.to