Handling Missing Modalities in Multimodal Survival Prediction for Non-Small Cell Lung Cancer

arXiv cs.CV / 4/27/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The study addresses how to perform accurate survival prediction for non-small cell lung cancer (NSCLC) when some modalities (CT, WSI histopathology, or structured clinical data) are missing, a common limitation in real-world cohorts.
  • It proposes a missing-aware multimodal survival framework that uses foundation models for modality-specific feature extraction and an encoding strategy that allows intermediate multimodal fusion under naturally incomplete patient data.
  • The architecture is designed to use all available data during both training and inference, avoiding patient drop-off caused by complete-case filtering or crude imputation.
  • On unresectable stage II–III NSCLC, intermediate fusion improves over unimodal baselines and over early/late fusion, with the trimodal setup achieving a C-index of 74.42.
  • Modality-importance analyses and statistical validation (including significant log-rank tests across modality combinations) indicate that the model’s risk scores support clinically meaningful stratification of progression and metastatic risk.

Abstract

Accurate survival prediction in Non-Small Cell Lung Cancer (NSCLC) requires integrating clinical, radiological, and histopathological data. Multimodal Deep Learning (MDL) can improve precision prognosis, but small cohorts and missing modalities limit its clinical applicability, as conventional approaches enforce complete case filtering or imputation. We present a missing-aware multimodal survival framework that combines Computed Tomography (CT), Whole-Slide Histopathology Images (WSI), and structured clinical variables for overall survival modeling in unresectable stage II-III NSCLC. The framework uses Foundation Models (FMs) for modality-specific feature extraction and a missing-aware encoding strategy that enables intermediate multimodal fusion under naturally incomplete modality profiles. By design, the architecture processes all available data without dropping patients during training or inference. Intermediate fusion outperforms unimodal baselines and both early and late fusion strategies, with the trimodal configuration reaching a C-index of 74.42. Modality-importance analyses show that the fusion model adapts its reliance on each data stream according to representation informativeness, shaped by the alignment between FM pretraining objectives and the survival task. The learned risk scores produce clinically meaningful stratification of disease progression and metastatic risk, with statistically significant log-rank tests across all modality combinations, supporting the translational relevance of the proposed framework.