Handling and Interpreting Missing Modalities in Patient Clinical Trajectories via Autoregressive Sequence Modeling

arXiv cs.AI / 4/22/2026

💬 OpinionModels & Research

Key Points

  • The paper tackles the challenge of missing modalities in multimodal healthcare ML by reframing clinical diagnosis as autoregressive sequence modeling of a patient’s multimodal trajectory.
  • It proposes a missingness-aware contrastive pre-training objective that learns a shared latent space across modalities even when some are absent.
  • Using causal decoders adapted from large language models, the authors model temporal clinical signals while aiming to preserve interpretability.
  • Experiments on MIMIC-IV and eICU fine-tuning benchmarks show that transformer-based autoregressive sequence modeling outperforms baseline approaches.
  • Interpretability analysis finds that removing modalities can cause divergent model behavior across patient stays, and that the contrastive pre-training helps mitigate this issue.

Abstract

An active challenge in developing multimodal machine learning (ML) models for healthcare is handling missing modalities during training and deployment. As clinical datasets are inherently temporal and sparse in terms of modality presence, capturing the underlying predictive signal via diagnostic multimodal ML models while retaining model explainability remains an ongoing challenge. In this work, we address this by re-framing clinical diagnosis as an autoregressive sequence modeling task, utilizing causal decoders from large language models (LLMs) to model a patient's multimodal trajectory. We first introduce a missingness-aware contrastive pre-training objective that integrates multiple modalities in datasets with missingness in a shared latent space. We then show that autoregressive sequence modeling with transformer-based architectures outperforms baselines on the MIMIC-IV and eICU fine-tuning benchmarks. Finally, we use interpretability techniques to move beyond performance boosts and find that across various patient stays, removing modalities leads to divergent behavior that our contrastive pre-training mitigates. By abstracting clinical diagnosis as sequence modeling and interpreting patient stay trajectories, we develop a framework to profile and handle missing modalities while addressing the canonical desideratum of safe, transparent clinical AI.