PSP: An Interpretable Per-Dimension Accent Benchmark for Indic Text-to-Speech
arXiv cs.CL / 4/29/2026
💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research
Key Points
- The paper introduces PSP (Phoneme Substitution Profile), a new interpretable per-phonological-dimension benchmark for measuring accent quality in Indic text-to-speech beyond standard intelligibility and naturalness metrics.
- PSP breaks accent into six dimensions—retroflex collapse rate, aspiration fidelity, vowel-length fidelity, Tamil-zha fidelity, Frechet Audio Distance, and prosodic signature divergence—and uses forced alignment with native-speaker centroid probes plus corpus-level distributional distance measures.
- The v1 benchmarks five systems (including ElevenLabs v3, Cartesia, Sarvam Bulbul, Indic Parler-TTS, and Praxy Voice) on Hindi, Telugu, and Tamil pilot sets and studies an additional Telugu case (R5->R6).
- Results show accent difficulty increases monotonically (Hindi < Telugu < Tamil), PSP rankings can diverge from WER-based rankings, and no single TTS system is best across all six accent dimensions.
- The authors release reference centroids, embeddings, prosodic feature matrices, golden sets, and MIT-licensed scoring code to support further reproducible accent-focused evaluation (with MOS-correlation planned for v2).
Related Articles

How I Use AI Agents to Maintain a Living Knowledge Base for My Team
Dev.to

An API testing tool built specifically for AI agent loops
Dev.to
IK_LLAMA now supports Qwen3.5 MTP Support :O
Reddit r/LocalLLaMA
OpenAI models, Codex, and Managed Agents come to AWS
Dev.to

Indian Developers: How to Build AI Side Income with $0 Capital in 2026
Dev.to