|
[link] [comments] |
nvidia/Gemma-4-26B-A4B-NVFP4
Reddit r/LocalLLaMA / 5/1/2026
📰 NewsSignals & Early TrendsTools & Practical UsageModels & Research
Key Points
- A user reports that the NVIDIA Gemma 4 26B model variant (nvidia/Gemma-4-26B-A4B-NVFP4) runs on an RTX 5090, using about 80% GPU memory allocation to achieve roughly 50k context length.
- The NVFP4 quantized model is reported to be about 18.8GB in size, implying lower VRAM requirements than full-precision variants.
- Benchmarks show similar or slightly improved performance versus full precision on several tests, including AIME 2025 (NVFP4 90.00% vs full precision 88.95%).
- Some metrics are slightly lower (e.g., GPQA Diamond 79.90% vs 80.30%), but others are nearly the same (e.g., IFEval 96.40% vs 96.60%).
Related Articles

Black Hat USA
AI Business

Why Autonomous Coding Agents Keep Failing — And What Actually Works
Dev.to

Text-to-image is easy. Chaining LLMs to generate, critique, and iterate on images autonomously is a routing nightmare. AgentSwarms now supports Image generation playground and creative media workflows!
Reddit r/artificial

Announcing the NVIDIA Nemotron 3 Super Build Contest
Dev.to

75% of Sites Blocking AI Bots Still Get Cited. Here Is Why Blocking Does Not Work.
Dev.to