| Both llama.cpp and ik_llama.cpp now have FP4 support — but with different flavors worth knowing about. llama.cpp recently merged NVFP4 (Nvidia's block-scaled FP4, `GGML_TYPE_NVFP4 = 40`), with CUDA kernels landing in `mmq.cuh`, `mmvq.cu`, `convert.cu` and others. ik_llama.cpp has had MXFP4 (`GGML_TYPE_MXFP4 = 39`) since PR #682 — the MX-standard FP4 used in gpt-oss models. Coverage is actually broader: CPU (AVX2, NEON, Zen4), CUDA, are all implemented. They're not the same wire format — NVFP4 is Nvidia-specific E4M3 with block scaling, MXFP4 follows the MX consortium standard — but both land in the 4-bit float regime and should bring meaningful VRAM savings once model support catches up. Verified by grepping both repos locally today. My specs: 5090(24GB VRAM) Go grab and play with models: Personal favorite ones: Exciting times for quantization. correction: removed "Meta's" [link] [comments] |
FP4 inference in llama.cpp (NVFP4) and ik_llama.cpp (MXFP4) landed - Finally
Reddit r/LocalLLaMA / 4/26/2026
💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research
Key Points
- llama.cpp に Nvidia のブロックスケーリング FP4「NVFP4」(GGML_TYPE_NVFP4=40)の推論対応がマージされ、CUDA カーネル(mmq.cuh、mmvq.cu、convert.cu など)も追加された。
- ik_llama.cpp では MXFP4(GGML_TYPE_MXFP4=39)が PR #682 以来提供されており、CPU(AVX2/NEON/Zen4)と CUDA の実装範囲が広い。
- NVFP4 と MXFP4 は同じワイヤフォーマットではなく、NVFP4 は Nvidia 固有の E4M3+ブロックスケーリング、MXFP4 は MX コンソーシアム標準に基づく。
- どちらも 4-bit 浮動小数点(FP4)領域での推論を実現するため、モデル側の対応が進めば VRAM の大幅節約につながる可能性がある。
- Hugging Face 上の NVFP4 モデル例を挙げつつ、ローカルでの動作確認も報告されている。
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to

How I tracked which AI bots actually crawl my site
Dev.to

Hijacking OpenClaw with Claude
Dev.to

How I Replaced WordPress, Shopify, and Mailchimp with Cloudflare Workers
Dev.to

Anthropic created a test marketplace for agent-on-agent commerce
TechCrunch