Multimodal Training to Unimodal Deployment: Leveraging Unstructured Data During Training to Optimize Structured Data Only Deployment

arXiv cs.LG / 3/25/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces a multimodal training approach that uses unstructured EHR elements (e.g., clinical notes) during training while outputting a model that can be deployed using only structured EHR fields.
  • It trains a “teacher” model that leverages note embeddings (via BioClinicalBERT) alongside structured embeddings (demographics and medical codes), and distills knowledge into a structured-only “student” model using contrastive learning and contrastive knowledge distillation.
  • Experiments on 3,466 pediatric cases for late-talking evaluation show AUROC of 0.705 for the structured-only deployed model, improving over a structured-only baseline AUROC of 0.656.
  • The results suggest that unstructured clinical context can help the model learn which aspects of structured data are task-relevant, without requiring unstructured inputs at inference time.
  • The work is positioned as enabling more practical deployment of phenotype/classification models where note access is limited or difficult in production.

Abstract

Unstructured Electronic Health Record (EHR) data, such as clinical notes, contain clinical contextual observations that are not directly reflected in structured data fields. This additional information can substantially improve model learning. However, due to their unstructured nature, these data are often unavailable or impractical to use when deploying a model. We introduce a multimodal learning framework that leverages unstructured EHR data during training while producing a model that can be deployed using only structured EHR data. Using a cohort of 3,466 children evaluated for late talking, we generated note embeddings with BioClinicalBERT and encoded structured embeddings from demographics and medical codes. A note-based teacher model and a structured-only student model were jointly trained using contrastive learning and contrastive knowledge distillation loss, producing a strong classifier (AUROC = 0.985). Our proposed model reached an AUROC of 0.705, outperforming the structured-only baseline of 0.656. These results demonstrate that incorporating unstructured data during training enhances the model's capacity to identify task-relevant information within structured EHR data, enabling a deployable structured-only phenotype model.