FLEURS-Kobani: Extending the FLEURS Dataset for Northern Kurdish
arXiv cs.CL / 4/1/2026
📰 NewsSignals & Early TrendsModels & Research
Key Points
- The paper introduces FLEURS-Kobani, a new spoken extension of the FLEURS benchmark that adds Northern Kurdish (ISO 639-3 KMR) to enable ASR, speech translation (S2TT), and speech-to-speech translation (S2ST) evaluation in this under-resourced language.
- FLEURS-Kobani contains 5,162 validated utterances (18 hours 24 minutes) recorded by 31 native speakers and is publicly released under a CC BY 4.0 license for research use.
- The work provides baseline results by fine-tuning Whisper v3-large for ASR and E2E S2TT, including a two-stage fine-tuning approach (Common Voice → FLEURS-Kobani) that achieves WER 28.11 and CER 9.84 on the test set.
- For KMR→EN speech translation, Whisper reaches 8.68 BLEU on test, and the paper also reports pivot-derived targets and a cascaded S2TT configuration to broaden evaluation setups.
- FLEURS-Kobani is positioned as the first public Northern Kurdish benchmark, filling a gap in prior FLEURS coverage and supporting standardized benchmarking for multiple speech tasks.
Related Articles

Black Hat Asia
AI Business

Show HN: 1-Bit Bonsai, the First Commercially Viable 1-Bit LLMs
Dev.to

I Built an AI Agent That Can Write Its Own Tools When It Gets Stuck
Dev.to

How to Create AI Videos in 20 Minutes (3 Free Tools, Zero Experience)
Dev.to

The source code to Aider has just leaked after being committed to github
Reddit r/LocalLLaMA