"OK Aura, Be Fair With Me": Demographics-Agnostic Training for Bias Mitigation in Wake-up Word Detection

arXiv cs.CL / 4/8/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses demographic bias in wake-up word detection and tests whether demographics-agnostic training can improve fairness across sex, age, and accent groups.
  • Experiments use the OK Aura database, with demographic labels excluded during training and reserved only for evaluation to avoid directly optimizing for fairness labels.
  • The study evaluates two approaches: speech data augmentation to improve generalization and knowledge distillation from pre-trained foundational speech models to transfer robust representations.
  • Results show large reductions in performance disparities, including a technique that cuts predictive disparity by 39.94% (sex), 83.65% (age), and 40.48% (accent) versus a baseline.
  • Overall, the findings suggest label-agnostic methodologies can measurably reduce demographic bias and produce a more equitable wake-up word detection profile.

Abstract

Voice-based interfaces are widely used; however, achieving fair Wake-up Word detection across diverse speaker populations remains a critical challenge due to persistent demographic biases. This study evaluates the effectiveness of demographics-agnostic training techniques in mitigating performance disparities among speakers of varying sex, age, and accent. We utilize the OK Aura database for our experiments, employing a training methodology that excludes demographic labels, which are reserved for evaluation purposes. We explore (i) data augmentation techniques to enhance model generalization and (ii) knowledge distillation of pre-trained foundational speech models. The experimental results indicate that these demographics-agnostic training techniques markedly reduce demographic bias, leading to a more equitable performance profile across different speaker groups. Specifically, one of the evaluated techniques achieves a Predictive Disparity reduction of 39.94\% for sex, 83.65\% for age, and 40.48\% for accent when compared to the baseline. This study highlights the effectiveness of label-agnostic methodologies in fostering fairness in Wake-up Word detection.