Hierarchical Probabilistic Principal Component Analysis of Longitudinal Data

arXiv stat.ML / 4/27/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that existing probabilistic PCA methods (e.g., PPCA) are not well-suited for longitudinal datasets that are both high-dimensional and have substantial missingness.
It proposes hierarchical probabilistic principal component analysis (HPPCA), a two-level probabilistic factor model that separates between-subject variability from time-varying within-subject dynamics.
HPPCA models within-subject latent factors using a Gaussian process and introduces an EM algorithm designed to handle missing data and flexible covariance kernels efficiently.
Simulation results show HPPCA substantially improves imputation accuracy over standard PPCA and multivariate functional PCA, even with heavy missingness and when the model is misspecified.
In a long COVID symptoms application, HPPCA captures hierarchical structure effectively and improves prediction of clinical outcomes and the recovery of masked clinical records compared with existing methods.

Abstract

In many longitudinal studies, a large number of variables are measured repeatedly over time, with substantial missing data. Existing methods, such as probabilistic principal component analysis (PPCA), are ill-equipped to handle such incomplete, high-dimensional longitudinal data, as they fail to account for the nested sources of variation and temporal dependency inherent in repeated measures. We introduce hierarchical probabilistic principal component analysis (HPPCA), a two-level probabilistic factor model that explicitly separates between-subject variance from time-varying within-subject dynamics. The within-subject latent factors are modeled by a Gaussian process. We develop an EM algorithm to handle missing data and flexible covariance kernels, accelerated by computationally efficient initializers. Simulation studies demonstrated that HPPCA robustly recovers model parameters subspaces and substantially outperforms both standard PPCA and multivariate functional PCA in imputation accuracy, even under heavy missingness and model misspecification. An application to the long COVID symptoms in the Researching COVID to Enhance Recovery adult cohort revealed that HPPCA effectively captured the data's hierarchical structure and its learned features significantly improved the prediction of clinical outcomes and the recovery of masked clinical records compared to exisiting methods.

Legal Insight Transformation: 7 Mistakes to Avoid When Adopting AI Tools

Dev.to

Legal Insight Transformation: Traditional vs. AI-Driven Research Compared

Dev.to

Legal Insight Transformation: A Beginner's Guide to Modern Research

Dev.to

I tested the same prompt across multiple AI models… the differences surprised me

Reddit r/artificial

The five loops between AI coding and AI engineering

Dev.to

Hierarchical Probabilistic Principal Component Analysis of Longitudinal Data

Key Points

Abstract

Related Articles

Legal Insight Transformation: 7 Mistakes to Avoid When Adopting AI Tools

Legal Insight Transformation: Traditional vs. AI-Driven Research Compared

Legal Insight Transformation: A Beginner's Guide to Modern Research

I tested the same prompt across multiple AI models… the differences surprised me

The five loops between AI coding and AI engineering

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer