Characterization of Gaussian Universality Breakdown in High-Dimensional Empirical Risk Minimization

arXiv stat.ML / 4/6/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper analyzes high-dimensional convex empirical risk minimization (ERM) under non-Gaussian data designs and studies how Gaussian universality breaks down.
  • By heuristically extending the Convex Gaussian Min-Max Theorem (CGMT) beyond Gaussian settings, the authors derive an asymptotic min-max characterization of ERM statistics.
  • The results show that, for a test covariate independent of the training data, the projection \(\hat{\theta}^\top x\) is approximately a convolution of the (possibly non-Gaussian) mean term with an independent centered Gaussian term whose variance depends on \(C_{\hat{\theta}}\) and the second moment \(\mathbb{E}[xx^\top]\).
  • The paper clarifies the limits of Gaussian universality and provides an asymptotic equivalence result stating that any \(\mathcal{C}^2\) regularizer behaves like a quadratic form determined only by its Hessian at zero and gradient at \(\mu_{\hat{\theta}}\).
  • Numerical simulations across multiple losses and model settings are used to validate the theoretical approximations and to illustrate qualitative implications.

Abstract

We study high-dimensional convex empirical risk minimization (ERM) under general non-Gaussian data designs. By heuristically extending the Convex Gaussian Min-Max Theorem (CGMT) to non-Gaussian settings, we derive an asymptotic min-max characterization of key statistics, enabling approximation of the mean \mu_{\hat{\theta}} and covariance C_{\hat{\theta}} of the ERM estimator \hat{\theta}. Specifically, under a concentration assumption on the data matrix and standard regularity conditions on the loss and regularizer, we show that for a test covariate x independent of the training data, the projection \hat{\theta}^\top x approximately follows the convolution of the (generally non-Gaussian) distribution of \mu_{\hat{\theta}}^\top x with an independent centered Gaussian variable of variance \text{Tr}(C_{\hat{\theta}}\mathbb{E}[xx^\top]). This result clarifies the scope and limits of Gaussian universality for ERMs. Additionally, we prove that any \mathcal{C}^2 regularizer is asymptotically equivalent to a quadratic form determined solely by its Hessian at zero and gradient at \mu_{\hat{\theta}}. Numerical simulations across diverse losses and models are provided to validate our theoretical predictions and qualitative insights.

Characterization of Gaussian Universality Breakdown in High-Dimensional Empirical Risk Minimization | AI Navigate