Random Matrix Theory of Early-Stopped Gradient Flow: A Transient BBP Scenario

arXiv stat.ML / 4/21/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces an analytically solvable random-matrix model for gradient flow in a linear teacher–student setting that reproduces a transient “detectable signal” regime before overfitting takes over.
  • It shows that learning depends on an isolated eigenvalue separating from a noisy spectral bulk, and that this eigenvalue can appear, persist, or disappear depending on time and conditions.
  • A central mechanism is anisotropy in the input covariance, which creates fast and slow directions in the learning dynamics and leads to a time-varying bulk spectrum.
  • Using a two-block covariance example and a Dyson equation approach, the authors derive explicit outlier conditions (including a rank-one teacher case) that yield a time-dependent, transient BBP transition.
  • The theory is mapped into phase diagrams and validated with finite-size simulations, proposing a minimal explanation of early stopping as a transient spectral effect driven by anisotropy and noise.

Abstract

Empirical studies of trained models often report a transient regime in which signal is detectable in a finite gradient descent time window before overfitting dominates. We provide an analytically tractable random-matrix model that reproduces this phenomenon for gradient flow in a linear teacher--student setting. In this framework, learning occurs when an isolated eigenvalue separates from a noisy bulk, before eventually disappearing in the overfitting regime. The key ingredient is anisotropy in the input covariance, which induces fast and slow directions in the learning dynamics. In a two-block covariance model, we derive the full time-dependent bulk spectrum of the symmetrized weight matrix through a 2\times 2 Dyson equation, and we obtain an explicit outlier condition for a rank-one teacher via a rank-two determinant formula. This yields a transient Baik-Ben Arous-P\'ech\'e (BBP) transition: depending on signal strength and covariance anisotropy, the teacher spike may never emerge, emerge and persist, or emerge only during an intermediate time interval before being reabsorbed into the bulk. We map the corresponding phase diagrams and validate the theory against finite-size simulations. Our results provide a minimal solvable mechanism for early stopping as a transient spectral effect driven by anisotropy and noise.