Useful nonrobust features are ubiquitous in biomedical images

arXiv cs.LG / 4/27/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The study investigates whether deep learning models for medical imaging rely on “nonrobust features” that are not human-interpretable yet predict class labels and are vulnerable to adversarial perturbations.
  • Models trained primarily on nonrobust features can still achieve well-above-chance accuracy on five MedMNIST classification tasks, indicating these features are predictive in-distribution.
  • Adversarial training shifts reliance toward more robust features, which reduces in-distribution accuracy but improves performance under controlled distribution shifts using MedMNIST-C.
  • The findings reveal a practical robustness–accuracy trade-off for medical image classification: emphasizing nonrobust features can raise standard accuracy while harming out-of-distribution generalization, so methods should be matched to deployment needs.

Abstract

We study whether deep networks for medical imaging learn useful nonrobust features - predictive input patterns that are not human interpretable and highly susceptible to small adversarial perturbations - and how these features impact test performance. We show that models trained only on nonrobust features achieve well above chance accuracy across five MedMNIST classification tasks, confirming their predictive value in-distribution. Conversely, adversarially trained models that primarily rely on robust features sacrifice in-distribution accuracy but yield markedly better performance under controlled distribution shifts (MedMNIST-C). Overall, nonrobust features boost standard accuracy yet degrade out-of-distribution performance, revealing a practical robustness-accuracy trade-off in medical imaging classification tasks that should be tailored to the requirements of the deployment setting.