HATS: An Open data set Integrating Human Perception Applied to the Evaluation of Automatic Speech Recognition Metrics
arXiv cs.CL / 5/1/2026
📰 NewsSignals & Early TrendsModels & Research
Key Points
- The paper argues that standard ASR evaluation—especially word error rate (WER)—is often too limited to fully reflect how well human users perceive transcription quality.
- It introduces HATS, a newly released French, human-manually annotated dataset capturing human perception of transcription errors across outputs from multiple ASR systems.
- In the dataset creation, 143 humans selected which transcription (from two ASR hypotheses) they preferred, enabling study of how human judgments align with different metric types.
- The research analyzes relationships between human preferences and both lexical and embedding-based metrics (including methods like BERTScore and semantic distance), focusing on which metrics best correlate with perceived quality.
- Overall, the work provides data and analysis to move ASR metric evaluation closer to human perception rather than purely transcript- or system-oriented scoring.
Related Articles

Why Autonomous Coding Agents Keep Failing — And What Actually Works
Dev.to

Text-to-image is easy. Chaining LLMs to generate, critique, and iterate on images autonomously is a routing nightmare. AgentSwarms now supports Image generation playground and creative media workflows!
Reddit r/artificial

Announcing the NVIDIA Nemotron 3 Super Build Contest
Dev.to

75% of Sites Blocking AI Bots Still Get Cited. Here Is Why Blocking Does Not Work.
Dev.to

How to Fix OpenClaw Tool Calling Issues
Dev.to