The Curse and Blessing of Mean Bias in FP4-Quantized LLM Training
arXiv cs.LG / 3/12/2026
📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research
Key Points
- The paper identifies a coherent rank-one mean bias as the primary driver of numerical instability in FP4-quantized LLM training, caused by blockwise quantization scales reacting to extreme activation magnitudes.
- This mean bias emerges systematically across layers and training stages and accounts for most extreme activation magnitudes, inflating dynamic range and compressing long-tail semantic variation.
- It can be removed with a simple source-level mean subtraction, avoiding heavy spectral methods while using standard quantization kernels.
- Empirical FP4 results show that mean removal narrows the loss gap to BF16 and restores downstream performance, providing a hardware-efficient path to stable low-bit LLM training.
Related Articles
Regulating Prompt Markets: Securities Law, Intellectual Property, and the Trading of Prompt Assets
Dev.to
Mercor competitor Deccan AI raises $25M, sources experts from India
Dev.to
How We Got Local MCP Servers Working in Claude Cowork (The Missing Guide)
Dev.to
How Should Students Document AI Usage in Academic Work?
Dev.to
I built a PWA fitness tracker with AI that supports 86 sports — as a solo developer
Dev.to