Robust Fair Disease Diagnosis in CT Images

arXiv cs.CV / 4/14/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses unfair performance in chest-CT disease diagnosis models caused not only by demographic skew, but by a compound issue where class imbalance and underrepresented groups overlap.
  • It proposes a two-level training objective combining logit-adjusted cross-entropy for sample-level class-frequency margin calibration with Conditional Value at Risk aggregation for group-level fairness pressure.
  • Experiments on the Fair Disease Diagnosis benchmark using a 3D ResNet-18 pretrained on Kinetics-400 evaluate classification across Adenocarcinoma, Squamous Cell Carcinoma, COVID-19, and Normal categories while using patient sex annotations.
  • Results report improved gender-averaged macro F1 (0.8403) with a small fairness gap (0.0239), including a 13.3% score improvement and an 78% reduction in demographic disparity versus a baseline.
  • Ablation studies indicate that neither the sample-level adjustment nor the group-level CVaR component alone achieves the full gains, and the authors provide public code on GitHub.

Abstract

Automated diagnosis from chest CT has improved considerably with deep learning, but models trained on skewed datasets tend to perform unevenly across patient demographics. However, the situation is worse than simple demographic bias. In clinical data, class imbalance and group underrepresentation often coincide, creating compound failure modes that neither standard rebalancing nor fairness corrections can fix alone. We introduce a two-level objective that targets both axes of this problem. Logit-adjusted cross-entropy loss operates at the sample level, shifting decision margins by class frequency with provable consistency guarantees. Conditional Value at Risk aggregation operates at the group level, directing optimization pressure toward whichever demographic group currently has the higher loss. We evaluate on the Fair Disease Diagnosis benchmark using a 3D ResNet-18 pretrained on Kinetics-400, classifying CT volumes into Adenocarcinoma, Squamous Cell Carcinoma, COVID-19, and Normal groups with patient sex annotations. The training set illustrates the compound problem concretely: squamous cell carcinoma has 84 samples total, 5 of them female. The combined loss reaches a gender-averaged macro F1 of 0.8403 with a fairness gap of 0.0239, a 13.3% improvement in score and 78% reduction in demographic disparity over the baseline. Ablations show that each component alone falls short. The code is publicly available at https://github.com/Purdue-M2/Fair-Disease-Diagnosis.