nvidia/Gemma-4-26B-A4B-NVFP4

Reddit r/LocalLLaMA / 5/1/2026

📰 NewsSignals & Early TrendsTools & Practical UsageModels & Research

Key Points

  • A user reports that the NVIDIA Gemma 4 26B model variant (nvidia/Gemma-4-26B-A4B-NVFP4) runs on an RTX 5090, using about 80% GPU memory allocation to achieve roughly 50k context length.
  • The NVFP4 quantized model is reported to be about 18.8GB in size, implying lower VRAM requirements than full-precision variants.
  • Benchmarks show similar or slightly improved performance versus full precision on several tests, including AIME 2025 (NVFP4 90.00% vs full precision 88.95%).
  • Some metrics are slightly lower (e.g., GPQA Diamond 79.90% vs 80.30%), but others are nearly the same (e.g., IFEval 96.40% vs 96.60%).
nvidia/Gemma-4-26B-A4B-NVFP4
  • Can confirm it works on a 5090, with 80% allocation (of 32gb) I got around 50k context.
  • It's 18.8GB
Benchmark Baseline (Full Precision) NVFP4
GPQA Diamond 80.30% 79.90%
AIME 2025 88.95% 90.00%
MMLU Pro 85.00% 84.80%
LiveCodeBench (pass@1) 80.50% 79.80%
IFBench 77.77% 78.1%
IFEval 96.60% 96.40%
submitted by /u/reto-wyss
[link] [comments]