Abstract
We develop the spectral edge thesis: phase transitions in neural network training -- grokking, capability gains, loss plateaus -- are controlled by the spectral gap of the rolling-window Gram matrix of parameter updates. In the extreme aspect ratio regime (parameters P \sim 10^8, window W \sim 10), the classical BBP detection threshold is vacuous; the operative structure is the intra-signal gap separating dominant from subdominant modes at position k^* = \mathrm{argmax}\, \sigma_j/\sigma_{j+1}.
From three axioms we derive: (i) gap dynamics governed by a Dyson-type ODE with curvature asymmetry, damping, and gradient driving; (ii) a spectral loss decomposition linking each mode's learning contribution to its Davis--Kahan stability coefficient; (iii) the Gap Maximality Principle, showing that k^* is the unique dynamically privileged position -- its collapse is the only one that disrupts learning, and it sustains itself through an \alpha-feedback loop requiring no assumption on the optimizer. The adiabatic parameter \mathcal{A} = \|\Delta G\|_F / (\eta\, g^2) controls circuit stability: \mathcal{A} \ll 1 (plateau), \mathcal{A} \sim 1 (phase transition), \mathcal{A} \gg 1 (forgetting).
Tested across six model families (150K--124M parameters): gap dynamics precede every grokking event (24/24 with weight decay, 0/24 without), the gap position is optimizer-dependent (Muon: k^*=1, AdamW: k^*=2 on the same model), and 19/20 quantitative predictions are confirmed. The framework is consistent with the edge of stability, Tensor Programs, Dyson Brownian motion, the Lottery Ticket Hypothesis, and neural scaling laws.