The Curse and Blessing of Mean Bias in FP4-Quantized LLM Training
arXiv cs.LG / 3/12/2026
📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research
Key Points
- The paper identifies a coherent rank-one mean bias as the primary driver of numerical instability in FP4-quantized LLM training, caused by blockwise quantization scales reacting to extreme activation magnitudes.
- This mean bias emerges systematically across layers and training stages and accounts for most extreme activation magnitudes, inflating dynamic range and compressing long-tail semantic variation.
- It can be removed with a simple source-level mean subtraction, avoiding heavy spectral methods while using standard quantization kernels.
- Empirical FP4 results show that mean removal narrows the loss gap to BF16 and restores downstream performance, providing a hardware-efficient path to stable low-bit LLM training.
Related Articles

The programming passion is melting
Dev.to

Maximize Developer Revenue with Monetzly's Innovative API for AI Conversations
Dev.to
Co-Activation Pattern Detection for Prompt Injection: A Mechanistic Interpretability Approach Using Sparse Autoencoders
Reddit r/LocalLLaMA

Nvidia GTC 2026: Jensen Huang Bets $1 Trillion on the Age of the AI Factory
Dev.to

Nvidia GTC 2026: Jensen Huang Eyes $1 Trillion in Orders as the AI Infrastructure Race Hits Warp Speed
Dev.to