"This Wasn't Made for Me": Recentering User Experience and Emotional Impact in the Evaluation of ASR Bias

arXiv cs.CL / 4/24/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • Research on ASR bias has often centered on error rates for underrepresented dialects, but this study examines the human and emotional consequences of those system failures.
  • User experience studies conducted in four U.S. locations show many participants feel ASR does not account for their cultural backgrounds and that they must continually adjust to use it.
  • Although participants report frustration and annoyance—and sometimes a sense of personal inadequacy—they still hold high expectations for ASR and are willing to help improve models.
  • Qualitative findings highlight “invisible labor” such as code-switching, hyper-articulation, and emotional management, and argue that fairness evaluations based on accuracy alone miss key harms like emotional labor, cognitive burden, and psychological toll.

Abstract

Studies on bias in Automatic Speech Recognition (ASR) tend to focus on reporting error rates for speakers of underrepresented dialects, yet less research examines the human side of system bias: how do system failures shape users' lived experiences, how do users feel about and react to them, and what emotional toll do these repeated failures exact? We conducted user experience studies across four U.S. locations (Atlanta, Gulf Coast, Miami Beach, and Tucson) representing distinct English dialect communities. Our findings reveal that most participants report technologies fail to consider their cultural backgrounds and require constant adjustment to achieve basic functionality. Despite these experiences, participants maintain high expectations for ASR performance and express strong willingness to contribute to model improvement. Qualitative analysis of open-ended narratives exposes the deeper costs of these failures. Participants report frustration, annoyance, and feelings of inadequacy, yet the emotional impact extends beyond momentary reactions. Participants recognize that systems were not designed for them, yet often internalize failures as personal inadequacy despite this critical awareness. They perform extensive invisible labor, including code-switching, hyper-articulation, and emotional management, to make failing systems functional. Meanwhile, their linguistic and cultural knowledge remains unrecognized by technologies that encode particular varieties as standard while rendering others marginal. These findings demonstrate that algorithmic fairness assessments based on accuracy metrics alone miss critical dimensions of harm: the emotional labor of managing repeated technological rejection, the cognitive burden of constant self-monitoring, and the psychological toll of feeling inadequate in one's native language variety.