A Deep Generative Approach to Stratified Learning
arXiv stat.ML / 4/14/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that many datasets are better represented as stratified spaces (unions of manifolds with varying dimensions) rather than a single manifold, and frames stratified learning as a challenge due to varying dimensionality and intersection singularities.
- It introduces two deep generative frameworks for learning distributions on stratified spaces: a sieve maximum likelihood method using a dimension-aware mixture of VAEs, and a diffusion-based method that leverages the score-field structure of a mixture.
- The authors provide theoretical convergence rates for learning both ambient and intrinsic distributions, showing that performance depends on intrinsic dimensions and strata smoothness as well as ambient noise.
- Beyond distribution learning, the work analyzes the score field geometry to establish consistency guarantees for estimating intrinsic dimensions per stratum and proposes an algorithm to infer the number of strata and their dimensions.
- Extensive simulations and real-data experiments, including molecular dynamics, are used to demonstrate the effectiveness of the proposed approaches.
