ggml : add NVFP4 quantization type support

Reddit r/LocalLLaMA / 3/13/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical Usage

共有:

Key Points

It adds support for NVIDIA NVFP4 quantization in GGML/llama.cpp, introducing a new GGML_TYPE_NVFP4 and related block structures and conversion helpers.
The update includes convert_hf_to_gguf.py that detects NVFP4 ModelOpt models and repacks them into the GGUF block format.
The CPU backend now uses scalar dot product with ARM NEON, and tests were added for backend operations and quantization functions; tested with NVFP4 models from HuggingFace and basic server smoke tests on Apple M5.
Release is available from the b8297 tag, with a test model Qwen3-4B-NVFP4-GGUF provided for testing.

ggml : add NVFP4 quantization type support

It's available b8297 onwards. Get latest llama.cpp version.

This adds support for NVIDIA's NVFP4 quantization format (FP4 E2M1 weights, UE4M3 per-block scale, 16 elements per block). This is the format produced by NVIDIA ModelOpt's NVFP4 algo. The main difference is the scale encoding (UE4M3 vs E8M0).

What's in here:

New GGML_TYPE_NVFP4 type, block struct, UE4M3 conversion helpers, reference quantize/dequantize

convert_hf_to_gguf.py detects NVFP4 ModelOpt models and repacks into the GGUF block format

CPU backend: scalar dot product + ARM NEON

gguf-py: type constant, quant/dequant, endian conversion

Tests added to test-backend-ops and test-quantize-fns

Tested with models from https://huggingface.co/NVFP4 Apple M5 MacBook (CPU, NEON) Ran llama-bench and a basic server smoke test. Would appreciate help with that if someone has a good baseline to compare against.

Here is a Qwen3-4B model to test with.

submitted by /u/pmttyji
[link] [comments]

I built an online background remover and learned a lot from launching it

Dev.to

How AI is Transforming Dynamics 365 Business Central

Dev.to

Algorithmic Gaslighting: A Formal Legal Template to Fight AI Safety Pivots That Cause Psychological Harm

Reddit r/artificial

ShieldCortex: What We Learned Protecting AI Agent Memory

Dev.to

WordPress Theme Customization Without Code: The AI Revolution

Dev.to

ggml : add NVFP4 quantization type support

Key Points

Related Articles

I built an online background remover and learned a lot from launching it

How AI is Transforming Dynamics 365 Business Central

Algorithmic Gaslighting: A Formal Legal Template to Fight AI Safety Pivots That Cause Psychological Harm

ShieldCortex: What We Learned Protecting AI Agent Memory

WordPress Theme Customization Without Code: The AI Revolution

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer