HiFloat4 Format for Language Model Pre-training on Ascend NPUs
arXiv cs.AI / 4/13/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper studies HiFloat4, a 4-bit floating-point (FP4) format optimized for Huawei Ascend NPUs, for language model pre-training.
- It compares HiFloat4 against MXFP4 in large-scale training runs where linear and expert GEMM operations are executed entirely in FP4 precision.
- Experiments cover both dense model architectures (e.g., Pangu- and LLaMA-style) and mixture-of-experts (MoE) models, including expert-specific GEMMs.
- The authors propose FP4-specific stabilization techniques that keep relative error within about 1% of full-precision baselines while retaining the efficiency gains of 4-bit compute.
- Overall, the work provides an empirical view of the practical trade-offs between FP4 formats for NPU-based LLM training and highlights how to mitigate FP4 numerical degradation.
Related Articles

Why Fashion Trend Prediction Isn’t Enough Without Generative AI
Dev.to

Chatbot vs Voicebot: The Real Business Decision Nobody Talks About
Dev.to

วิธีใช้ AI ทำ SEO ให้เว็บติดอันดับ Google (2026)
Dev.to

Free AI Tools With No Message Limits — The Definitive List (2026)
Dev.to

Why Domain Knowledge Is Critical in Healthcare Machine Learning
Dev.to