Generalization at the Edge of Stability
arXiv cs.LG / 4/22/2026
💬 OpinionModels & Research
Key Points
- The paper studies why training neural networks with large learning rates “at the edge of stability” can improve generalization, despite optimization dynamics becoming oscillatory or chaotic.
- It models stochastic optimizers as random dynamical systems, showing they can converge to fractal attractor sets with lower intrinsic dimension rather than single points.
- Building on Lyapunov dimension ideas, the authors introduce a new metric called “sharpness dimension” and derive a generalization bound tied to this quantity.
- The bound depends on the full Hessian spectrum and the structure of its partial determinants, indicating that neither trace nor spectral norm alone explains generalization in the chaotic regime.
- Experiments on multiple MLPs and transformers support the theory and provide additional insight into “grokking,” a recently observed training phenomenon.
Related Articles

Rethinking CNN Models for Audio Classification
Dev.to
v0.20.0rc1
vLLM Releases
I built my own event bus for a sustainability app — here's what I learned about agent automation using OpenClaw
Dev.to

HNHN: Hypergraph Networks with Hyperedge Neurons
Dev.to

Anthropic’s Mythos is stoking cybersecurity fears. What does it mean for China?
SCMP Tech