AI Navigate

Opportunistic Cardiac Health Assessment: Estimating Phenotypes from Localizer MRI through Multi-Modal Representations

arXiv cs.CV / 3/17/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper presents C-TRIP, a tri-modal framework that uses localizer MRI, ECG signals, and tabular metadata to estimate cardiac phenotypes without relying on cine CMR.
  • It follows a three-stage pipeline: independently trained uni-modal encoders, a fusion stage to unify the latent spaces, and a final predictor trained on the enriched representation for CP estimation.
  • The approach exploits cheap localizers for spatial information and ECG for temporal patterns, augmented by patient context from metadata, to predict both functional and structural CPs with high correlations.
  • Because localizers are fast and low-cost, C-TRIP could improve accessibility of CP estimation in clinical practice.

Abstract

Cardiovascular diseases are the leading cause of death. Cardiac phenotypes (CPs), e.g., ejection fraction, are the gold standard for assessing cardiac health, but they are derived from cine cardiac magnetic resonance imaging (CMR), which is costly and requires high spatio-temporal resolution. Every magnetic resonance (MR) examination begins with rapid and coarse localizers for scan planning, which are discarded thereafter. Despite non-diagnostic image quality and lack of temporal information, localizers can provide valuable structural information rapidly. In addition to imaging, patient-level information, including demographics and lifestyle, influence the cardiac health assessment. Electrocardiograms (ECGs) are inexpensive, routinely ordered in clinical practice, and capture the temporal activity of the heart. Here, we introduce C-TRIP (Cardiac Tri-modal Representations for Imaging Phenotypes), a multi-modal framework that aligns localizer MRI, ECG signals, and tabular metadata to learn a robust latent space and predict CPs using localizer images as an opportunistic alternative to CMR. By combining these three modalities, we leverage cheap spatial and temporal information from localizers, and ECG, respectively while benefiting from patient-specific context provided by tabular data. Our pipeline consists of three stages. First, encoders are trained independently to learn uni-modal representations. The second stage fuses the pre-trained encoders to unify the latent space. The final stage uses the enriched representation space for CP prediction, with inference performed exclusively on localizer MRI. Proposed C-TRIP yields accurate functional CPs, and high correlations for structural CPs. Since localizers are inherently rapid and low-cost, our C-TRIP framework could enable better accessibility for CP estimation.