Ara-Best-RQ: Multi Dialectal Arabic SSL

arXiv cs.CL / 3/24/2026

📰 NewsSignals & Early TrendsModels & Research

Key Points

  • Ara-BEST-RQ is a family of self-supervised learning models tailored to multi-dialect Arabic speech processing, trained for tasks like dialect identification (DID) and automatic speech recognition (ASR).
  • The work pre-trains conformer-based BEST-RQ models at up to 600M parameters using 5,640 hours of Creative Commons crawled Arabic speech combined with publicly available datasets.
  • Results show state-of-the-art performance for dialect identification while using fewer parameters than competing approaches.
  • The authors find that dialect-family-targeted pre-training for Arabic improves downstream performance versus multilingual or monolingual models trained on non-Arabic data.
  • All models, code, and pre-processed datasets are planned for public release to enable reproducibility and further research.

Abstract

We present Ara-BEST-RQ, a family of self-supervised learning (SSL) models specifically designed for multi-dialectal Arabic speech processing. Leveraging 5,640 hours of crawled Creative Commons speech and combining it with publicly available datasets, we pre-train conformer-based BEST-RQ models up to 600M parameters. Our models are evaluated on dialect identification (DID) and automatic speech recognition (ASR) tasks, achieving state-of-the-art performance on the former while using fewer parameters than competing models. We demonstrate that family-targeted pre-training on Arabic dialects significantly improves downstream performance compared to multilingual or monolingual models trained on non-Arabic data. All models, code, and pre-processed datasets will be publicly released to support reproducibility and further research in Arabic speech technologies.