| It's available b8297 onwards. Get latest llama.cpp version.
[link] [comments] |
ggml : add NVFP4 quantization type support
Reddit r/LocalLLaMA / 3/13/2026
📰 NewsDeveloper Stack & InfrastructureTools & Practical Usage
Key Points
- It adds support for NVIDIA NVFP4 quantization in GGML/llama.cpp, introducing a new GGML_TYPE_NVFP4 and related block structures and conversion helpers.
- The update includes convert_hf_to_gguf.py that detects NVFP4 ModelOpt models and repacks them into the GGUF block format.
- The CPU backend now uses scalar dot product with ARM NEON, and tests were added for backend operations and quantization functions; tested with NVFP4 models from HuggingFace and basic server smoke tests on Apple M5.
- Release is available from the b8297 tag, with a test model Qwen3-4B-NVFP4-GGUF provided for testing.
Related Articles

ベテランの若手育成負担を減らせ、PLC制御の「ラダー図」をAIで生成
日経XTECH

Hey dev.to community – sharing my journey with Prompt Builder, Insta Posts, and practical SEO
Dev.to

Why Regex is Not Enough: Building a Deterministic "Sudo" Layer for AI Agents
Dev.to

Perplexity Hub
Dev.to

How to Build Passive Income with AI in 2026: A Developer's Practical Guide
Dev.to