Information-Geometric Decomposition of Generalization Error in Unsupervised Learning

arXiv stat.ML / 4/15/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper provides an exact information-geometric decomposition of the unsupervised learning generalization error (expected KL divergence) into three non-negative terms: model error, data bias, and variance.
  • The decomposition holds for any e-flat model class and is derived from the generalized Pythagorean theorem and a dual e-mixture variance identity from information geometry.
  • As a concrete demonstration, the authors analyze a rank-regularized variant of PCA (ε-PCA) and show that, under a technical reformulation on isotropic Gaussian data, each decomposition component admits closed-form expressions.
  • The optimal PCA rank cutoff is determined as λ*_{cut}=ε, reflecting a trade-off between reducing model error and avoiding increased data bias, with the cutoff tied to a marginal-rate balance.
  • Using boundary comparisons, the work derives a three-regime phase diagram (retain-all, interior, collapse) with transition points linked to the Marchenko–Pastur edge and an analytically computable collapse threshold ε*(α), and validates all results numerically.

Abstract

We decompose the Kullback--Leibler generalization error (GE) -- the expected KL divergence from the data distribution to the trained model -- of unsupervised learning into three non-negative components: model error, data bias, and variance. The decomposition is exact for any e-flat model class and follows from two identities of information geometry: the generalized Pythagorean theorem and a dual e-mixture variance identity. As an analytically tractable demonstration, we apply the framework to \epsilon-PCA, a regularized principal component analysis in which the empirical covariance is truncated at rank N_K and discarded directions are pinned at a fixed noise floor \epsilon. Although rank-constrained \epsilon-PCA is not itself e-flat, it admits a technical reformulation with the same total GE on isotropic Gaussian data, under which each component of the decomposition takes closed form. The optimal rank emerges as the cutoff \lambda_{\mathrm{cut}}^{*} = \epsilon -- the model retains exactly those empirical eigenvalues exceeding the noise floor -- with the cutoff reflecting a marginal-rate balance between model-error gain and data-bias cost. A boundary comparison further yields a three-regime phase diagram -- retain-all, interior, and collapse -- separated by the lower Marchenko--Pastur edge and an analytically computable collapse threshold \epsilon_{*}(\alpha), where \alpha is the dimension-to-sample-size ratio. All claims are verified numerically.