Curvature-Aligned Probing for Local Loss-Landscape Stabilization

arXiv cs.LG / 4/17/2026

📰 NewsModels & Research

Key Points

  • The paper reframes local loss-landscape stabilization as an observational problem and proposes a unified family of probing criteria controlled by aggregation order and probing distribution.
  • It introduces a curvature-aligned criterion (Δ₂^(D)) that evaluates loss increments by projecting onto the top-D eigenspace of the empirical Hessian near a trained solution, targeting dominant anisotropic deformation directions.
  • The authors theoretically show that, using only a local quadratic loss model, Δ₂^(D) maintains the O(k^{-2}) mean-squared convergence rate of a full-parameter-space criterion while replacing dependence on ambient dimension with dependence on the chosen subspace dimension D.
  • They provide efficient, scalable estimators using Hessian-vector products, subspace Monte Carlo, and a closed-form Gaussian-moment proxy, including spectral closed-form expressions.
  • Experiments on a decoder-only transformer indicate that probes using only a small fraction of parameter space can reproduce full-space mean-squared stabilization signals within numerical noise, with the closed-form estimator becoming orders of magnitude faster than direct Monte Carlo after subspace construction.

Abstract

Local loss-landscape stabilization under sample growth is typically measured either pointwise or through isotropic averaging in the full parameter space. Despite practical value, both choices probe directions that contribute little to the dominant local deformation of strongly anisotropic neural landscapes. We recast stabilization as an observational problem and introduce a unified family of criteria parameterized by an aggregation order and a probing distribution; within this family we propose a curvature-aligned criterion \Delta_2^{(D)} that probes the loss increment field in the top-D eigenspace of the empirical Hessian near a trained solution. Solely from a local quadratic model, we prove that \Delta_2^{(D)} preserves the O(k^{-2}) mean-squared rate of the full-space criterion while replacing ambient-dimension curvature dependence with dependence on the subspace dimension D; a corollary gives a closed-form spectral expression and a proposition identifies the top-D eigenspace as extremal within the eigenspace-aligned family. We also derive scalable estimators based on Hessian-vector products, subspace Monte Carlo, and a closed-form Gaussian-moment proxy. On a decoder-only transformer, a curvature-aligned probe occupying a tiny fraction of parameter space already reproduces the full-space mean-squared signal to within numerical noise throughout the validated local regime, and the closed-form estimator is orders of magnitude faster than direct Monte Carlo after subspace construction.