Too Sharp, Too Sure: When Calibration Follows Curvature

arXiv cs.LG / 4/23/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that neural-network calibration should be treated as a training-time property rather than a purely post-hoc step.
  • It finds a strong coupling during training between calibration, curvature/sharpness, and classification margins across multiple gradient-based optimization methods.
  • Empirically, Expected Calibration Error (ECE) closely follows curvature-based sharpness as optimization progresses.
  • The authors provide a theoretical link showing that both ECE and Gauss–Newton curvature are governed by the same margin-dependent exponential-tail functional along the training trajectory.
  • Based on this mechanism, they propose a margin-aware training objective that improves out-of-sample calibration and local smoothness across optimizers without reducing accuracy.

Abstract

Modern neural networks can achieve high accuracy while remaining poorly calibrated, producing confidence estimates that do not match empirical correctness. Yet calibration is often treated as a post-hoc attribute. We take a different perspective: we study calibration as a training-time phenomenon on small vision tasks, and ask whether calibrated solutions can be obtained reliably by intervening on the training procedure. We identify a tight coupling between calibration, curvature, and margins during training of deep networks under multiple gradient-based methods. Empirically, Expected Calibration Error (ECE) closely tracks curvature-based sharpness throughout optimization. Mathematically, we show that both ECE and Gauss--Newton curvature are controlled, up to problem-specific constants, by the same margin-dependent exponential tail functional along the trajectory. Guided by this mechanism, we introduce a margin-aware training objective that explicitly targets robust-margin tails and local smoothness, yielding improved out-of-sample calibration across optimizers without sacrificing accuracy.