Fairness of Classifiers in the Presence of Constraints between Features

arXiv cs.AI / 5/4/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper examines how standard fairness notions for classifiers—specifically independence from protected attributes like gender—can fail when constraints exist between features, masking underlying dependencies.
It proposes a new fairness criterion based on the existence of a “fair explanation,” defined as a prime-implicant reason for the decision that contains no protected features, accounting for feature constraints.
The authors find that ignoring constraints can drastically change whether a decision is considered fair under this explanation-based definition, even when protected and non-protected features are not directly constrained.
They analyze three definitions of classifier fairness (all decisions have fair explanations, at least one decision has a fair explanation, or outcomes are invariant under changes to protected features) and study the computational complexity of testing these properties.

Abstract

In Machine Learning, an accepted definition of fairness of a decision taken by a classifier is that it should not depend on protected features, such as gender. Unfortunately, when constraints exist between features, such dependencies can be obscured by the constraints. To avoid this problem, we propose that a decision be considered fair if it has a fair explanation. We define a fair explanation as a prime-implicant reason for the decision that does not contain any protected feature (where the constraints are taken into account in the definition of prime-implicant). Surprisingly, ignoring constraints can completely change the fairness of a decision (according to this definition) even in the absence of constraints between protected and unprotected features. Three possible definitions of fairness of a classifier are that for all its decisions (1) there are only fair explanations, (2) there is at least one fair explanation, or (3) changing protected features does not change the outcome. We identify the relationships between these different definitions of fairness and study the computational complexity of testing fairness of classifiers.