| It's available b8297 onwards. Get latest llama.cpp version.
[link] [comments] |
ggml : add NVFP4 quantization type support
Reddit r/LocalLLaMA / 3/13/2026
📰 NewsDeveloper Stack & InfrastructureTools & Practical Usage
Key Points
- It adds support for NVIDIA NVFP4 quantization in GGML/llama.cpp, introducing a new GGML_TYPE_NVFP4 and related block structures and conversion helpers.
- The update includes convert_hf_to_gguf.py that detects NVFP4 ModelOpt models and repacks them into the GGUF block format.
- The CPU backend now uses scalar dot product with ARM NEON, and tests were added for backend operations and quantization functions; tested with NVFP4 models from HuggingFace and basic server smoke tests on Apple M5.
- Release is available from the b8297 tag, with a test model Qwen3-4B-NVFP4-GGUF provided for testing.
Related Articles

I built an online background remover and learned a lot from launching it
Dev.to
How AI is Transforming Dynamics 365 Business Central
Dev.to
Algorithmic Gaslighting: A Formal Legal Template to Fight AI Safety Pivots That Cause Psychological Harm
Reddit r/artificial
ShieldCortex: What We Learned Protecting AI Agent Memory
Dev.to
WordPress Theme Customization Without Code: The AI Revolution
Dev.to