CURE-OOD: Benchmarking Out-of-Distribution Detection for Survival Prediction

arXiv cs.CV / 5/4/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces CURE-OOD, a new benchmark to systematically evaluate out-of-distribution (OOD) detection in cancer survival prediction under controlled imaging acquisition shifts.
  • It addresses a gap in prior survival-prediction work, where CT-based models can suffer reliability issues when scanner and acquisition parameter variations create OOD samples.
  • CURE-OOD organizes the data into scanner-parameter-based training splits and both in-distribution (ID) and OOD test splits across four survival prediction tasks.
  • The authors find that covariate shifts significantly degrade survival prediction performance and that many mainstream, classification-oriented OOD detectors may fail for survival prediction.
  • They provide HazardDev as a simple survival-aware baseline for OOD detection to support fair comparison and further analysis.

Abstract

``How long can I live and remain free of cancer?'' is often the first question a patient asks after receiving a cancer diagnosis and treatment. Accurate survival prediction helps alleviate psychological distress and supports risk stratification and personalized treatment planning. Recent survival prediction frameworks have shown strong performance using computed tomography (CT) images. However, variations in imaging acquisition introduce out-of-distribution (OOD) samples caused by covariate shifts that undermine model reliability. Despite this challenge, to our knowledge, no existing benchmark systematically studies OOD detection in cancer survival prediction. To address this gap, we introduce the Cancer sURvival bEnchmark for OOD Detection (CURE-OOD), the first benchmark for systematically evaluating OOD detection in survival prediction under controlled acquisition-induced distribution shifts. CURE-OOD defines scanner-parameter-based training, in-distribution (ID), and OOD test splits across four survival prediction tasks. Our experiments show that covariate shifts notably reduce survival prediction performance. It also shows that mainstream classification-oriented OOD detectors can fail in survival prediction. Finally, we include HazardDev as a simple survival-aware reference baseline for OOD detection. CURE-OOD enables systematic analysis of how distribution shifts affect both downstream survival performance and OOD detectability.