Medical AI gets 66% worse when you use automated labels for training, and the benchmark hides it! [R][P]

Reddit r/MachineLearning / 3/21/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

A ISBI 2026 paper reports that breast cancer segmentation models perform significantly worse for younger patients because tumors in this group are larger, more variable, and harder to learn from, beyond just higher breast density.
The bias is qualitative rather than simply due to density, indicating fundamental learning difficulties with younger-patient tumors.
Training with automated labels can amplify model bias by about 40%, and standard benchmarks may obscure this bias due to a 'biased ruler' effect.
The work underscores the need for clean, unbiased labels and evaluation protocols in medical imaging to accurately assess model fairness.
The findings were presented at ISBI 2026 (oral), signaling a notable research milestone in medical AI fairness.

A recent work on fairness in medical segmentation for breast cancer tumors found that segmentation models work way worse for younger patients.

Common explanation: higher breast density = harder cases. But this is not it. The bias is qualitative -- younger patients have tumors that are larger, more variable, and fundamentally harder to learn from, not just more of the same hard cases.

Also, an interesting finding that training for automated labels may amplify bias in your model by 40%. But the benchmark does not show it due to the 'biased ruler' effect, in which using biased labels to measure performance may mask true performance. This also highlights the need for 'clean' and unbiased labels in medical imaging for evaluation.

Paper - https://arxiv.org/abs/2511.00477 - International Symposium on Biomedical Imaging (ISBI) 2026 (oral)

submitted by /u/ade17_in
[link] [comments]