Enhancing ASR Performance in the Medical Domain for Dravidian Languages
arXiv cs.CL / 4/23/2026
💬 OpinionDeveloper Stack & InfrastructureTools & Practical UsageModels & Research
Key Points
- The paper addresses low-resource medical-domain ASR for Dravidian languages such as Telugu and Kannada, where limited annotated data and morphological complexity hinder performance.
- It introduces a confidence-aware training framework that fuses real and synthetic (TTS) speech using a hybrid confidence signal combining static perceptual/acoustic similarity metrics with dynamic model entropy.
- Instead of straightforward fine-tuning, the method uses fixed-weight and learnable-weight confidence aggregation to compute sample weighting during training from heterogeneous data sources.
- Experiments on medical datasets with both real recordings and TTS-generated audio show large gains, with Telugu WER improving from 24.3% to 15.8% and Kannada WER from 31.7% to 25.4%.
- Post-decoding correction is performed with a 5-gram KenLM language model, and the proposed hybrid approach outperforms standard fine-tuning baselines while improving recognition accuracy in this specialized domain.




