Harf-Speech: A Clinically Aligned Framework for Arabic Phoneme-Level Speech Assessment

arXiv cs.AI / 4/10/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical UsageModels & Research

共有:

Key Points

Harf-Speech is introduced as a modular framework for Arabic phoneme-level pronunciation assessment aimed at supporting scalable speech therapy and language learning where validated Arabic tools are limited.
The system combines an MSA phonetizer, a fine-tuned speech-to-phoneme model, Levenshtein-style alignment, and a blended scoring approach based on longest common subsequence and edit-distance metrics.
Three Arabic ASR architectures are fine-tuned on phoneme data and benchmarked against zero-shot multimodal models, with OmniASR-CTC-1B-v2 achieving an 8.92% phoneme error rate.
Clinical validation involved three certified speech-language pathologists scoring 40 utterances, and Harf-Speech produced clinically aligned, interpretable scores that correlate with expert ratings (Pearson 0.791, ICC(2,1) 0.659) and outperform prior end-to-end assessment frameworks.
The reported results position Harf-Speech as yielding scores comparable to inter-rater expert agreement, emphasizing clinical alignment rather than only generic pronunciation scoring accuracy.

Abstract

Automated phoneme-level pronunciation assessment is vital for scalable speech therapy and language learning, yet validated tools for Arabic remain scarce. We present Harf-Speech, a modular system scoring Arabic pronunciation at the phoneme level on a clinical scale. It combines an MSA phonetizer, a fine-tuned speech-to-phoneme model, Levenshtein alignment, and a blended scorer using longest common subsequence and edit-distance metrics. We fine-tune three ASR architectures on Arabic phoneme data and benchmark them with zero-shot multimodal models; the best, OmniASR-CTC-1B-v2, achieves 8.92\% phoneme error rate. Three certified speech-language pathologists independently scored 40 utterances for clinical validation. Harf-Speech attains a Pearson correlation of 0.791 and ICC(2,1) of 0.659 with mean expert scores, outperforming existing end-to-end assessment frameworks. These results show Harf-Speech yields clinically aligned, interpretable scores comparable to inter-rater expert agreement.

Black Hat USA

AI Business

Black Hat Asia

AI Business

GLM 5.1 tops the code arena rankings for open models

Reddit r/LocalLLaMA

My Bestie Built a Free MCP Server for Job Search — Here's How It Works

Dev.to

can we talk about how AI has gotten really good at lying to you?

Reddit r/artificial

Harf-Speech: A Clinically Aligned Framework for Arabic Phoneme-Level Speech Assessment

Key Points

Abstract

Related Articles

Black Hat USA

Black Hat Asia

GLM 5.1 tops the code arena rankings for open models

My Bestie Built a Free MCP Server for Job Search — Here's How It Works

can we talk about how AI has gotten really good at lying to you?

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer