MXNorm: Reusing MXFP block scales for efficient tensor normalisation
arXiv cs.LG / 3/16/2026
📰 NewsTools & Practical UsageModels & Research
Key Points
- MXNorm is a drop-in replacement for RMSNorm that estimates RMS using only the block scales from MXFP8, enabling a 32x reduction in the size of the reduction needed for normalization.
- The method is validated on pre-training of Llama 3 models (125M, 1B, 8B) with minimal accuracy loss compared to a RMSNorm baseline.
- It achieves practical kernel speedups up to 2.4x using only torch.compile, with reported around 1.3% speedup in Llama 3 8B transformer layers (MXFP8) and 2.6% in NVFP4.
- As a hardware-conscious optimization that reuses existing MXFP8 scales, MXNorm reduces normalization compute and improves efficiency without requiring major changes to model code.
Related Articles
Day 10: An AI Agent's Revenue Report — $29, 25 Products, 160 Tweets
Dev.to
Does Synthetic Data Generation of LLMs Help Clinical Text Mining?
Dev.to
What CVE-2026-25253 Taught Me About Building Safe AI Assistants
Dev.to
Krish Naik: AI Learning Path For 2026- Data Science, Generative and Agentic AI Roadmap
Dev.to
Day 52: Building vs Shipping — Why We Had 711 Commits and 0 Users
Dev.to