Beta-Scheduling: Momentum from Critical Damping as a Diagnostic and Correction Tool for Neural Network Training

arXiv cs.AI / 4/1/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes “beta-scheduling,” a time-varying momentum schedule derived from a critically damped harmonic oscillator, set as μ(t)=1−2√α(t) using the current learning rate and introducing no extra free parameters.
  • Experiments on ResNet-18/CIFAR-10 show the beta-schedule reaches 90% accuracy in about 1.9× fewer training steps than constant momentum (e.g., 0.9).
  • The method provides a cross-optimizer invariant diagnostic signal: per-layer gradient attribution identifies the same three problematic layers whether the model is trained with SGD or Adam.
  • Using this localization, “surgical correction” of only the identified layers fixes 62 misclassifications while retraining just 18% of parameters, indicating targeted repair potential.
  • A hybrid approach (physics-based momentum early, constant momentum later) achieves the fastest path to 95% accuracy among several compared schedules, emphasizing both convergence and practical refinement.

Abstract

Standard neural network training uses constant momentum (typically 0.9), a convention dating to 1964 with limited theoretical justification for its optimality. We derive a time-varying momentum schedule from the critically damped harmonic oscillator: mu(t) = 1 - 2*sqrt(alpha(t)), where alpha(t) is the current learning rate. This beta-schedule requires zero free parameters beyond the existing learning rate schedule. On ResNet-18/CIFAR-10, beta-scheduling delivers 1.9x faster convergence to 90% accuracy compared to constant momentum. More importantly, the per-layer gradient attribution under this schedule produces a cross-optimizer invariant diagnostic: the same three problem layers are identified regardless of whether the model was trained with SGD or Adam (100% overlap). Surgical correction of only these layers fixes 62 misclassifications while retraining only 18% of parameters. A hybrid schedule -- physics momentum for fast early convergence, then constant momentum for the final refinement -- reaches 95% accuracy fastest among five methods tested. The main contribution is not an accuracy improvement but a principled, parameter-free tool for localizing and correcting specific failure modes in trained networks.