Hierarchical Probabilistic Principal Component Analysis of Longitudinal Data
arXiv stat.ML / 4/27/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that existing probabilistic PCA methods (e.g., PPCA) are not well-suited for longitudinal datasets that are both high-dimensional and have substantial missingness.
- It proposes hierarchical probabilistic principal component analysis (HPPCA), a two-level probabilistic factor model that separates between-subject variability from time-varying within-subject dynamics.
- HPPCA models within-subject latent factors using a Gaussian process and introduces an EM algorithm designed to handle missing data and flexible covariance kernels efficiently.
- Simulation results show HPPCA substantially improves imputation accuracy over standard PPCA and multivariate functional PCA, even with heavy missingness and when the model is misspecified.
- In a long COVID symptoms application, HPPCA captures hierarchical structure effectively and improves prediction of clinical outcomes and the recovery of masked clinical records compared with existing methods.
Related Articles

Legal Insight Transformation: 7 Mistakes to Avoid When Adopting AI Tools
Dev.to

Legal Insight Transformation: Traditional vs. AI-Driven Research Compared
Dev.to

Legal Insight Transformation: A Beginner's Guide to Modern Research
Dev.to
I tested the same prompt across multiple AI models… the differences surprised me
Reddit r/artificial

The five loops between AI coding and AI engineering
Dev.to