Limit Theorems for Stochastic Gradient Descent in High-Dimensional Single-Layer Networks

arXiv stat.ML / 5/1/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper analyzes high-dimensional scaling limits of online stochastic gradient descent (SGD), focusing on how the step size governs the effective dynamics.
It identifies a critical step-size scaling regime for single-layer networks, where the behavior transitions from deterministic (ballistic) dynamics to a qualitatively new regime.
At the critical scaling, the authors show that an additional correction term appears, altering the system’s phase diagram compared with purely deterministic limits.
Near fixed points, under certain conditions, the diffusive (SDE) limits of the effective dynamics simplify to an Ornstein–Uhlenbeck process.
The results connect the “information exponent” to sample complexity and argue that deterministic scaling limits cannot fully capture stochastic fluctuations in high-dimensional learning.

Abstract

This paper studies the high-dimensional scaling limits of online stochastic gradient descent (SGD). Building on the recent work of Ben Arous, Gheissari, and Jagannath on the effective dynamics of SGD, we study the critical scaling regime of the step size for single-layer networks. Below this critical regime, the effective dynamics are governed by deterministic (ballistic) limits, whereas at the critical scale, a new correction term emerges that changes the phase diagram. In this regime, near fixed points, the corresponding diffusive (SDE) limits of the effective dynamics reduce to an Ornstein-Uhlenbeck process under certain conditions. These results highlight how the information exponent controls sample complexity and illustrate the limitations of deterministic scaling limits in capturing stochastic fluctuations in high-dimensional learning dynamics.