Habibi: Laying the Open-Source Foundation of Unified-Dialectal Arabic Speech Synthesis
arXiv cs.CL / 4/1/2026
📰 NewsSignals & Early TrendsTools & Practical UsageModels & Research
Key Points
- Habibi is an open-source, unified-dialect Arabic text-to-speech (TTS) framework designed to cover 12+ regional dialects despite major cross-dialect lexical/phonological gaps.
- The system repurposes open-source ASR corpora into TTS training data via a multi-step curation pipeline and uses a linguistically informed curriculum learning strategy to enable robust zero-shot dialectal synthesis without text diacritization.
- The release includes the first standardized multi-dialect Arabic TTS benchmark (11,000+ utterances across 7 dialect subsets) with manually verified transcripts.
- On the benchmark, Habibi’s unified model matches or surpasses per-dialect specialized models, and evaluations (automatic and human) show competitiveness with ElevenLabs’ Eleven v3 (alpha) on intelligibility, speaker similarity, and naturalness.
- The authors also open-source all checkpoints, training/inference code, and benchmark data, supported by extensive ablation studies using roughly 8,000 H100 GPU hours.




