Preferences of a Voice-First Nation: Large-Scale Pairwise Evaluation and Preference Analysis for TTS in Indian Languages
arXiv cs.CL / 4/24/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The study proposes a controlled, multidimensional pairwise evaluation framework to reduce high variance when crowdsourcing preference judgments for multilingual TTS.
- Using 5K+ native and code-mixed sentences across 10 Indic languages, the authors benchmark 7 state-of-the-art TTS systems with 120K+ pairwise comparisons from 1,900+ native raters.
- Raters score models not only on overall preference but also across six perceptual dimensions: intelligibility, expressiveness, voice quality, liveliness, noise, and hallucinations.
- The paper builds a multilingual leaderboard via Bradley–Terry modeling and uses SHAP analysis plus reliability checks to connect human preferences to specific model strengths and trade-offs.
- The work highlights how linguistic diversity and multi-attribute perception can be jointly handled to produce more interpretable and dependable TTS evaluation results.



