Voice of India: A Large-Scale Benchmark for Real-World Speech Recognition in India

arXiv cs.CL / 4/22/2026

📰 NewsModels & Research

Key Points

  • The article introduces Voice of India, a large-scale, closed-source benchmark aimed at evaluating real-world speech recognition for Indian languages using unscripted telephonic conversations rather than scripted speech.
  • It covers 15 major Indian languages across 139 regional clusters and includes 306,230 utterances totaling 536 hours from 36,691 speakers, with transcripts designed to reflect real spelling variations.
  • The benchmark is intended to reduce dataset-specific overfitting and better reflect natural language phenomena that can be unfairly penalized by strict single-reference WER in Indic and code-mixed settings.
  • The authors analyze ASR performance geographically at the district level and across factors including audio quality, speaking rate, gender, and device type, identifying where current systems underperform.
  • They conclude by offering actionable insights for improving Indic ASR systems for real-world deployment across diverse regions and recording conditions.

Abstract

Existing Indic ASR benchmarks often use scripted, clean speech and leaderboard driven evaluation that encourages dataset specific overfitting. In addition, strict single reference WER penalizes natural spelling variation in Indian languages, including non standardized spellings of code-mixed English origin words. To address these limitations, we introduce Voice of India, a closed source benchmark built from unscripted telephonic conversations covering 15 major Indian languages across 139 regional clusters. The dataset contains 306230 utterances, totaling 536 hours of speech from 36691 speakers with transcripts accounting for spelling variations. We also analyze performance geographically at the district level, revealing disparities. Finally, we provide detailed analysis across factors such as audio quality, speaking rate, gender, and device type, highlighting where current ASR systems struggle and offering insights for improving real world Indic ASR systems.