Inference on covariance structure in high-dimensional multi-view data

arXiv stat.ML / 4/20/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses covariance estimation in high-dimensional multi-view data, improving on factor-analytic methods that use shared and view-specific latent factors.
  • It proposes a spectral decomposition approach that estimates and aligns latent factors active in at least one view, enabling a closed-form posterior without MCMC.
  • By using jointly conjugate priors for factor loadings and residual variances, the posterior factorizes into normal-inverse-gamma distributions for each variable, making computation simpler and more stable.
  • The authors provide theoretical results, including increasing-dimension asymptotic properties such as posterior contraction and central limit theorems for point estimators.
  • Experiments show strong simulation performance with accurate uncertainty quantification, and an application to integrating four high-dimensional views from a multi-omics cancer-cell dataset.

Abstract

This article focuses on covariance estimation for multi-view data. Popular approaches rely on factor-analytic decompositions that have shared and view-specific latent factors. Posterior computation is conducted via expensive and brittle Markov chain Monte Carlo (MCMC) sampling or variational approximations that underestimate uncertainty and lack theoretical guarantees. Our proposed methodology employs spectral decompositions to estimate and align latent factors that are active in at least one view. Conditionally on these factors, we choose jointly conjugate prior distributions for factor loadings and residual variances. The resulting posterior is a simple product of normal-inverse gamma distributions for each variable, bypassing MCMC and facilitating posterior computation. We prove favorable increasing-dimension asymptotic properties, including posterior contraction and central limit theorems for point estimators. We show excellent performance in simulations, including accurate uncertainty quantification, and apply the methodology to integrate four high-dimensional views from a multi-omics dataset of cancer cell samples.