When Flat Minima Fail: Characterizing INT4 Quantization Collapse After FP32 Convergence

arXiv cs.LG / 4/17/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • The study challenges the common PTQ assumption that a well-converged FP32 model is automatically “quantization-ready” by showing INT4 can catastrophically collapse even after FP32 convergence.
  • Using a calibration-free per-group INT4 probe across 154 public Pythia-160m checkpoints, the authors identify a three-phase behavior: rapid improvement, a ~70k-step metastable plateau, and an explosive INT4 divergence where the INT4 gap grows from 11% to 517% while FP32 perplexity barely changes.
  • The onset of divergence aligns with a finer-grained FP32 perplexity convergence trigger—suggesting post-convergence weight updates are the proximate cause rather than learning-rate decay magnitude alone.
  • The phenomenon is specific to the coarseness of the 16-level INT4 grid: INT8 is immune across all phases, and weight outlier accumulation is ruled out via direct kurtosis measurements.
  • Controlled fork experiments with different learning-rate schedules show SGDR accelerates divergence in all runs, while the proposed Oscillatory Lock-In (with settled “cool” phases) reduces the INT4 gap by 2.2 percentage points on average, indicating that amplitude calibration—not oscillation itself—determines whether perturbations help or hurt.

Abstract

Post-training quantization (PTQ) assumes that a well-converged model is a quantization-ready model. We show this assumption fails in a structured, measurable, and previously uncharacterized way. Using a calibration-free per-group INT4 probe applied to all 154 publicly available Pythia-160m training checkpoints, we identify a three-phase divergence structure: a rapid-learning phase where both FP32 perplexity and quantization robustness improve together, a meta-stable plateau lasting roughly 70,000 steps where FP32 perplexity stagnates but INT4 gap remains bounded, and an explosive divergence phase where the INT4 gap compounds from 11% to 517% while FP32 perplexity barely moves. Critically, this divergence begins not when the learning rate starts decaying, but precisely when FP32 perplexity converges a finer-grained onset predictor that implies post-convergence weight updates, rather than decay magnitude alone, are the proximate cause. We further show that INT8 quantization is entirely immune throughout all three phases, constraining the mechanism to the coarseness of the 16-level INT4 grid specifically, and rule out weight outlier accumulation as the mechanism via direct kurtosis measurement. Finally, we conduct a controlled fork experiment from the pre-divergence checkpoint comparing three learning rate schedules (cosine continuation, SGDR warm restarts, and our proposed Oscillatory Lock-In) across nine independent runs. SGDR uniformly accelerates divergence (0/9 pairwise wins against cosine), while OLI's settled cool phases reduce the INT4 gap by 2.2 percentage points on average (t = -5.46, p < 0.0001), demonstrating that schedule amplitude calibration, not oscillation alone, determines whether perturbation helps or hurts. Our code, probe implementation, and all 154-checkpoint audit results are released publicly.