Disentangling Shared and Task-Specific Representations from Multi-Modal Clinical Data

arXiv cs.LG / 5/6/2026

📰 NewsModels & Research

Key Points

  • The paper proposes a multi-task framework for multimodal clinical prediction that more carefully separates information that is shared across outcomes from signals that are specific to each outcome.
  • It introduces Orthogonal Task Decomposition (OrthTD), which splits patient representations into shared and task-specific subspaces and uses a geometric orthogonality constraint to reduce redundancy and mitigate negative transfer.
  • The approach is implemented on a unified Transformer architecture for multimodal fusion, aiming to balance shared representation learning with outcome-specific modeling.
  • Experiments on a real cohort of 12,430 surgical patients (predicting four outcomes) show improved performance, achieving an average AUC of 87.5% and AUPRC of 37.2%, with especially strong gains on AUPRC for rare-event detection.
  • The findings suggest that enforcing non-redundant shared/task-specific representations can enhance multi-outcome prediction from complex multimodal clinical datasets.

Abstract

Real-world clinical data is inherently multimodal, providing complementary evidence that mirrors the practical necessity of jointly assessing multiple related outcomes. Although multi-task learning can improve efficiency by sharing information across outcomes, existing approaches often fail to balance shared representation learning with outcome-specific modeling. Hard parameter sharing can trigger negative transfer when task gradients conflict, while flexible sharing may still entangle shared and task-specific signals. To address this, we propose a multi-task framework built on a unified Transformer for multimodal fusion, augmented with Orthogonal Task Decomposition (OrthTD) to split patient representations into shared and task-specific subspaces and impose a geometric orthogonality constraint to reduce redundancy and isolate task-specific signals. We evaluated OrthTD on a real-world cohort of 12,430 surgical patients for predicting four outcomes. OrthTD achieved average AUC (area under the receiver operating characteristic curve) of 87.5% and average AUPRC (area under the precision-recall curve) of 37.2%, consistently outperformed advanced tabular and multi-task methods. Notably, OrthTD achieves substantial gains in AUPRC, indicating superior performance in identifying rare events within imbalanced clinical data. These results suggest that enforcing non-redundant shared and task-specific representations can improve multi-outcome prediction from multimodal clinical data.