Explainable Speech Emotion Recognition: Weighted Attribute Fairness to Model Demographic Contributions to Social Bias

arXiv cs.CL / 4/23/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses fairness risks in Speech Emotion Recognition (SER) systems used in sensitive domains like mental health and education.
  • It argues that common fairness metrics (e.g., Equalised Odds, Demographic Parity) can miss how demographic attributes jointly influence model predictions.
  • The authors propose a weighted attribute fairness method that learns the joint relationship between demographic attributes and model error to model allocative bias.
  • They validate the approach on synthetic data and apply it to SER models fine-tuned from HuBERT and WavLM on the CREMA-D dataset.
  • The findings suggest the method better captures mutual information between protected attributes and bias, provides attribute-level bias contribution estimates, and shows evidence of gender bias in both HuBERT- and WavLM-based models.

Abstract

Speech Emotion Recognition (SER) systems have growing applications in sensitive domains such as mental health and education, where biased predictions can cause harm. Traditional fairness metrics, such as Equalised Odds and Demographic Parity, often overlook the joint dependency between demographic attributes and model predictions. We propose a fairness modelling approach for SER that explicitly captures allocative bias by learning the joint relationship between demographic attributes and model error. We validate our fairness metric on synthetic data, then apply it to evaluate HuBERT and WavLM models finetuned on the CREMA-D dataset. Our results indicate that the proposed fairness model captures more mutual information between protected attributes and biases and quantifies the absolute contribution of individual attributes to bias in SSL-based SER models. Additionally, our analysis reveals indications of gender bias in both HuBERT and WavLM.