Beta-Scheduling: Momentum from Critical Damping as a Diagnostic and Correction Tool for Neural Network Training

arXiv cs.AI / 4/1/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes “beta-scheduling,” a time-varying momentum schedule derived from a critically damped harmonic oscillator, set as μ(t)=1−2√α(t) using the current learning rate and introducing no extra free parameters.
Experiments on ResNet-18/CIFAR-10 show the beta-schedule reaches 90% accuracy in about 1.9× fewer training steps than constant momentum (e.g., 0.9).
The method provides a cross-optimizer invariant diagnostic signal: per-layer gradient attribution identifies the same three problematic layers whether the model is trained with SGD or Adam.
Using this localization, “surgical correction” of only the identified layers fixes 62 misclassifications while retraining just 18% of parameters, indicating targeted repair potential.
A hybrid approach (physics-based momentum early, constant momentum later) achieves the fastest path to 95% accuracy among several compared schedules, emphasizing both convergence and practical refinement.

Abstract

Standard neural network training uses constant momentum (typically 0.9), a convention dating to 1964 with limited theoretical justification for its optimality. We derive a time-varying momentum schedule from the critically damped harmonic oscillator: mu(t) = 1 - 2*sqrt(alpha(t)), where alpha(t) is the current learning rate. This beta-schedule requires zero free parameters beyond the existing learning rate schedule. On ResNet-18/CIFAR-10, beta-scheduling delivers 1.9x faster convergence to 90% accuracy compared to constant momentum. More importantly, the per-layer gradient attribution under this schedule produces a cross-optimizer invariant diagnostic: the same three problem layers are identified regardless of whether the model was trained with SGD or Adam (100% overlap). Surgical correction of only these layers fixes 62 misclassifications while retraining only 18% of parameters. A hybrid schedule -- physics momentum for fast early convergence, then constant momentum for the final refinement -- reaches 95% accuracy fastest among five methods tested. The main contribution is not an accuracy improvement but a principled, parameter-free tool for localizing and correcting specific failure modes in trained networks.