Exploring the impact of fairness-aware criteria in AutoML

arXiv cs.LG / 4/14/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • AutoML systems used for high-impact decisions can amplify discrimination if they primarily optimize for predictive performance using biased data.
  • The paper investigates adding fairness-aware criteria directly into the optimisation step of an AutoML pipeline that spans data selection/transformations through model selection and tuning.
  • Because fairness metrics can represent different notions of “fairness,” the authors integrate complementary fairness metrics during optimisation to better capture multiple fairness dimensions.
  • Results show measurable trade-offs versus a predictive-performance-only baseline: predictive power drops by 9.4% while average fairness improves by 14.5%, and data usage decreases by 35.7%.
  • Fairness-aware optimisation also tends to yield complete but simpler final pipelines, indicating that improved fairness does not necessarily require increased model complexity.

Abstract

Machine Learning (ML) systems are increasingly used to support decision-making processes that affect individuals. However, these systems often rely on biased data, which can lead to unfair outcomes against specific groups. With the growing adoption of Automated Machine Learning (AutoML), the risk of intensifying discriminatory behaviours increases, as most frameworks primarily focus on model selection to maximise predictive performance. Previous research on fairness in AutoML had largely followed this trend, integrating fairness awareness only in the model selection or hyperparameter tuning, while neglecting other critical stages of the ML pipeline. This paper aims to study the impact of integrating fairness directly into the optimisation component of an AutoML framework that constructs complete ML pipelines, from data selection and transformations to model selection and tuning. As selecting appropriate fairness metrics remains a key challenge, our work incorporates complementary fairness metrics to capture different dimensions of fairness during the optimisation. Their integration within AutoML resulted in measurable differences compared to a baseline focused solely on predictive performance. Despite a 9.4% decrease in predictive power, the average fairness improved by 14.5%, accompanied by a 35.7% reduction in data usage. Furthermore, fairness integration produced complete yet simpler final solutions, suggesting that model complexity is not always required to achieve balanced and fair ML solutions.