Large Spikes in Stochastic Gradient Descent: A Large-Deviations View

arXiv cs.LG / 3/12/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper provides a quantitative theory of the catapult phase in SGD training of a shallow network under NTK scaling.
It identifies an explicit criterion G, depending on the kernel, learning rate, and data, that separates two regimes of behavior: G>0 yields large NTK-flattening spikes with high probability, while G<0 leads to a spike-probability decay ~ (n/eta)^{-vartheta/2} with vartheta in (0, ∞).
This yields a concrete, parameter-dependent explanation for why such spikes can still be observed at practical network widths.
The analysis employs a large-deviations viewpoint to characterize spike probabilities and relate kernel dynamics to training hyperparameters.

Abstract

We analyse SGD training of a shallow, fully connected network in the NTK scaling and provide a quantitative theory of the catapult phase. We identify an explicit criterion separating two behaviours: When an explicit function

G

, depending only on the kernel, learning rate

\eta

and data, is positive, SGD produces large NTK-flattening spikes with high probability; when

G<0

, their probability decays like

(n/\eta)^{-\vartheta/2}

, for an explicitly characterised

\vartheta\in (0,\infty)

. This yields a concrete parameter-dependent explanation for why such spikes may still be observed at practical widths.

Astral to Join OpenAI

Dev.to

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.

Reddit r/LocalLLaMA

Why Data is Important for LLM

Dev.to

The Inference Market Is Consolidating. Agent Payments Are Still Nobody's Problem.

Dev.to

YouTube's Deepfake Shield for Politicians Changes Evidence Forever

Dev.to

Large Spikes in Stochastic Gradient Descent: A Large-Deviations View

Key Points

Abstract

Related Articles

Astral to Join OpenAI

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.

Why Data is Important for LLM

The Inference Market Is Consolidating. Agent Payments Are Still Nobody's Problem.

YouTube's Deepfake Shield for Politicians Changes Evidence Forever

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer