Discriminative Representation Learning for Clinical Prediction

arXiv cs.LG / 3/24/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper challenges the common healthcare “foundation model” approach that relies on self-supervised, generative-style pretraining (e.g., reconstruction/large-scale representation learning from NLP and CV) before fine-tuning for clinical tasks.
  • It proposes an outcome-centric supervised representation learning framework that shapes embedding geometry by maximizing inter-class separation relative to within-class variance, aligning capacity to clinically meaningful axes.
  • Experiments across multiple longitudinal electronic health record prediction tasks (including mortality and readmission) show consistent improvements over masked, autoregressive, and contrastive pretraining baselines when model capacity is matched.
  • The method is reported to improve discrimination, calibration, and sample efficiency while using a simpler single-stage optimization training pipeline.
  • The authors argue that in “low entropy,” outcome-driven clinical domains where high-quality labels are available, direct outcome alignment may be a statistically optimal driver—potentially removing the assumption that large-scale self-supervised pretraining is required for strong performance.

Abstract

Foundation models in healthcare have largely adopted self supervised pretraining objectives inherited from natural language processing and computer vision, emphasizing reconstruction and large scale representation learning prior to downstream adaptation. We revisit this paradigm in outcome centric clinical prediction settings and argue that, when high quality supervision is available, direct outcome alignment may provide a stronger inductive bias than generative pretraining. We propose a supervised deep learning framework that explicitly shapes representation geometry by maximizing inter class separation relative to within class variance, thereby concentrating model capacity along clinically meaningful axes. Across multiple longitudinal electronic health record tasks, including mortality and readmission prediction, our approach consistently outperforms masked, autoregressive, and contrastive pretraining baselines under matched model capacity. The proposed method improves discrimination, calibration, and sample efficiency, while simplifying the training pipeline to a single stage optimization. These findings suggest that in low entropy, outcome driven healthcare domains, supervision can act as the statistically optimal driver of representation learning, challenging the assumption that large scale self supervised pretraining is a prerequisite for strong clinical performance.