YoNER: A New Yor\`ub\'a Multi-domain Named Entity Recognition Dataset
arXiv cs.CL / 4/8/2026
📰 NewsSignals & Early TrendsModels & Research
Key Points
- The paper introduces YoNER, a new multi-domain Yoruba Named Entity Recognition dataset (about 5,000 sentences / 100,000 tokens) spanning Bible, Blogs, Movies, Radio broadcasts, and Wikipedia, and annotated in CoNLL style with PER/ORG/LOC entity types.
- Manual annotation by three native Yoruba speakers achieved inter-annotator agreement above 0.70, aiming for high-quality and consistent labels across domains.
- Cross-domain benchmarking with transformer encoder models (including comparisons against MasakhaNER 2.0) shows African-centric models generally outperform general multilingual ones, but performance drops sharply in certain domains like blogs and movies.
- Domain-transfer experiments indicate that closer formal domains (news and Wikipedia) transfer more effectively than other domains, highlighting domain sensitivity for Yoruba NER.
- The authors also release pretrained resources, including a Yoruba-specific language model (OyoBERT), which outperforms multilingual models on in-domain evaluation, alongside public release of YoNER.
Related Articles

Black Hat Asia
AI Business
[N] Just found out that Milla Jovovich is a dev, invested in AI, and just open sourced a project
Reddit r/MachineLearning

ALTK‑Evolve: On‑the‑Job Learning for AI Agents
Hugging Face Blog

Context Windows Are Getting Absurd — And That's a Good Thing
Dev.to

Every AI Agent Registry in 2026, Compared
Dev.to