Interactive ASR: Towards Human-Like Interaction and Semantic Coherence Evaluation for Agentic Speech Recognition
arXiv cs.CL / 4/13/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that standard ASR evaluation via word error rate (WER) can miss sentence-level semantic errors, motivating semantic-aware assessment beyond token accuracy.
- It introduces an agentic interactive ASR framework that uses an LLM-as-a-judge to evaluate semantic coherence and recognition quality.
- The authors also design an LLM-driven multi-turn interaction mechanism to simulate human-like correction, iteratively refining ASR outputs using semantic feedback.
- Experiments on benchmarks such as GigaSpeech (English), WenetSpeech (Chinese), and ASRU 2019 code-switching show improvements in semantic fidelity and interactive correction capability via both objective and subjective measures.
- The authors plan to release code to support further research in interactive and agentic speech recognition systems.
Related Articles

Black Hat Asia
AI Business

Apple is building smart glasses without a display to serve as an AI wearable
THE DECODER

Why Fashion Trend Prediction Isn’t Enough Without Generative AI
Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to

Chatbot vs Voicebot: The Real Business Decision Nobody Talks About
Dev.to