Fairboard: a quantitative framework for equity assessment of healthcare models

arXiv cs.LG / 4/14/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisTools & Practical UsageModels & Research

共有:

Key Points

The paper introduces Fairboard and evaluates equity across 18 open-source brain tumor segmentation models using 648 glioma patients and 11,664 inferences from two independent datasets.
Results show that patient identity explains more performance variance than which model is used, and clinical factors (e.g., molecular diagnosis, tumor grade, extent of resection) predict segmentation accuracy more strongly than model architecture.
A voxel-wise spatial meta-analysis reveals neuroanatomically localized, compartment-specific biases that are often consistent across different models.
In a high-dimensional latent space of lesion masks plus clinic-demographic features, model performance clusters significantly, suggesting the patient feature space contains directions where models are vulnerable.
While newer models show somewhat better equity, none offers a formal fairness guarantee; the authors therefore release Fairboard as an open-source, no-code dashboard for equitable monitoring in medical imaging.

Abstract

Despite there now being more than 1,000 FDA-authorised AI medical devices, formal equity assessments -- whether model performance is uniform across patient subgroups -- are rare. Here, we evaluate the equity of 18 open-source brain tumour segmentation models across 648 glioma patients from two independent datasets (n = 11,664 model inferences) along distinct univariate, Bayesian multivariate, spatial, and representational dimensions. We find that patient identity consistently explains more performance variance than model choice, with clinical factors, including molecular diagnosis, tumour grade, and extent of resection, predicting segmentation accuracy more strongly than model architecture. A voxel-wise spatial meta-analysis identifies neuroanatomically localised biases that are compartment-specific yet often consistent across models. Within a high-dimensional latent space of lesion masks and clinic-demographic features, model performance clusters significantly, indicating that the patient feature space contains axes of algorithmic vulnerability. Although newer models tend toward greater equity, none provide a formal fairness guarantee. Lastly, we release Fairboard, an open-source, no-code dashboard that lowers barriers to equitable model monitoring in medical imaging.