AI Navigate

Probabilistic Joint and Individual Variation Explained (ProJIVE) for Data Integration

arXiv cs.LG / 3/16/2026

📰 NewsTools & Practical UsageModels & Research

Key Points

  • ProJIVE develops an EM-based probabilistic model to jointly and individually capture variation across multiple data sets collected on the same subjects, extending probabilistic PCA to multiset data.
  • The method estimates joint and individual components via maximum likelihood, which can improve accuracy compared with existing JIVE approaches.
  • The authors demonstrate the approach on brain morphometry and cognitive measures in Alzheimer's disease, showing joint scores align with more expensive biomarkers.
  • They provide code on GitHub to reproduce the analysis and facilitate its application to other multimodal datasets.

Abstract

Collecting multiple types of data on the same set of subjects is common in modern scientific applications including, genomics, metabolomics, and neuroimaging. Joint and Individual Variance Explained (JIVE) seeks a low-rank approximation of the joint variation between two or more sets of features captured on common subjects and isolates this variation from that unique to eachset of features. We develop an expectation-maximization (EM) algorithm to estimate a probabilistic model for the JIVE framework. The model extends probabilistic principal components analysis to multiple data sets. Our maximum likelihood approach simultaneously estimates joint and individual components, which can lead to greater accuracy compared to other methods. We apply ProJIVE to measures of brain morphometry and cognition in Alzheimer's disease. ProJIVE learns biologically meaningful courses of variation, and the joint morphometry and cognition subject scores are strongly related to more expensive existing biomarkers. Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. Code to reproduce the analysis is available on our GitHub page.