Information-Geometric Decomposition of Generalization Error in Unsupervised Learning

arXiv stat.ML / 4/15/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper provides an exact information-geometric decomposition of the unsupervised learning generalization error (expected KL divergence) into three non-negative terms: model error, data bias, and variance.
The decomposition holds for any e-flat model class and is derived from the generalized Pythagorean theorem and a dual e-mixture variance identity from information geometry.
As a concrete demonstration, the authors analyze a rank-regularized variant of PCA (ε-PCA) and show that, under a technical reformulation on isotropic Gaussian data, each decomposition component admits closed-form expressions.
The optimal PCA rank cutoff is determined as λ*_{cut}=ε, reflecting a trade-off between reducing model error and avoiding increased data bias, with the cutoff tied to a marginal-rate balance.
Using boundary comparisons, the work derives a three-regime phase diagram (retain-all, interior, collapse) with transition points linked to the Marchenko–Pastur edge and an analytically computable collapse threshold ε*(α), and validates all results numerically.

Abstract

We decompose the Kullback--Leibler generalization error (GE) -- the expected KL divergence from the data distribution to the trained model -- of unsupervised learning into three non-negative components: model error, data bias, and variance. The decomposition is exact for any e-flat model class and follows from two identities of information geometry: the generalized Pythagorean theorem and a dual e-mixture variance identity. As an analytically tractable demonstration, we apply the framework to

\epsilon

-PCA, a regularized principal component analysis in which the empirical covariance is truncated at rank

N_K

and discarded directions are pinned at a fixed noise floor

\epsilon

. Although rank-constrained

\epsilon

-PCA is not itself e-flat, it admits a technical reformulation with the same total GE on isotropic Gaussian data, under which each component of the decomposition takes closed form. The optimal rank emerges as the cutoff

\lambda_{\mathrm{cut}}^{*} = \epsilon

-- the model retains exactly those empirical eigenvalues exceeding the noise floor -- with the cutoff reflecting a marginal-rate balance between model-error gain and data-bias cost. A boundary comparison further yields a three-regime phase diagram -- retain-all, interior, and collapse -- separated by the lower Marchenko--Pastur edge and an analytically computable collapse threshold

\epsilon_{*}(\alpha)

, where

\alpha

is the dimension-to-sample-size ratio. All claims are verified numerically.

Are gamers being used as free labeling labor? The rise of "Simulators" that look like AI training grounds [D]

Reddit r/MachineLearning

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Failure to Reproduce Modern Paper Claims [D]

Reddit r/MachineLearning

Why don’t they just use Mythos to fix all the bugs in Claude Code?

Reddit r/LocalLLaMA

Information-Geometric Decomposition of Generalization Error in Unsupervised Learning

Key Points

Abstract

Related Articles

Are gamers being used as free labeling labor? The rise of "Simulators" that look like AI training grounds [D]

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Failure to Reproduce Modern Paper Claims [D]

Why don’t they just use Mythos to fix all the bugs in Claude Code?

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer