llama.cpp's Preliminary SM120 Native NVFP4 MMQ Is Merged

Reddit r/LocalLLaMA / 4/29/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

Key Points

  • llama.cpp has merged a “preliminary” implementation of SM120 native NVFP4 MMQ, enabling a new low-level optimization path for supported NVIDIA GPUs.
  • The change is introduced via a specific upstream pull request (PR #22196) in the llama.cpp repository.
  • Early availability of GGUF model files compatible with NVFP4 appears to have already emerged on Hugging Face, including Gemma-4 and Nemotron variants.
  • This suggests rapid community follow-through on the new quantization/format support, which may accelerate local inference experimentation.
  • While labeled preliminary, the merge indicates active development momentum toward broader GPU-native performance features in llama.cpp.