Gemma 4 26B-A4B GGUF Benchmarks

Reddit r/LocalLLaMA / 4/20/2026

📰 NewsSignals & Early TrendsTools & Practical UsageModels & Research

Key Points

  • The post reports KL-Divergence benchmarks comparing Gemma 4 26B-A4B GGUF quantizations across providers to help users choose the best quantization.
  • Mean KL Divergence results indicate that nearly all Unsloth GGUFs lie on the Pareto frontier, suggesting strong fidelity to the original BF16 output distribution.
  • Unsloth is characterized as top-performing in 21 out of 22 sizes, with similarly strong trends observed at 99.9% KLD.
  • Unsloth updated several quant variants (e.g., Q6_K made more dynamic, and similar updates for Qwen3.6) and notes the newer versions may be slightly larger, though the prior ones remain usable without re-downloading.
  • A new UD-IQ4_NL_XL quant option (14.6GB) is introduced to fit within 16GB VRAM for Gemma 4 (and similarly for Qwen3.6), positioned between smaller and larger UD-IQ4 variants.
Gemma 4 26B-A4B GGUF Benchmarks

Hey r/LocalLLaMA we conducted KL Divergence benchmarks for Gemma 4 26B-A4B GGUFs across providers to help you pick the best quant.

  • Mean KL Divergence puts nearly all Unsloth GGUFs on the Pareto frontier
  • KLD shows how well a quantized model matches the original BF16 output distribution, indicating retained accuracy.
  • This makes Unsloth the top-performing in 21 of 22 sizes. Similar trend for 99.9% KLD and others.
  • We also updated our Q6_K quants to be more dynamic. Previously, they were optimized, just now they're a bit better - no need to re-download though - it's up to you if you want a slightly better version. The previous quant was perfectly fine but this one is slightly bigger. The same was done for Qwen3.6.
  • We're also introducing a new UD-IQ4_NL_XL quant that fits in 16GB VRAM. UD-IQ4_NL_XL (14.6GB) sits between UD-IQ4_XS (13.4GB) and UD-Q4_K_S (16.4GB). The same was done for Qwen3.6.

For HQ versions of the graphs as Reddit mobile compresses it. See: Gemma 4 Benchmarks and Qwen3.6 Benchmarks

We also updated our MLX quants to be more dynamic with better layering selection (there are limitations due to MLX): See here

MLX Metrics UD-4bit (Old) UD-4bit (New) MLX 4.4bit MSQ
Perplexity 4.772 4.766 4.864
Mean KLD 0.0177 0.0163 0.0878
99.9% KLD 0.8901 0.8398 2.9597
Disk Sze 21.4 GB 21.6 GB 21.2 GB

Gemma 4 GGUFs: https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF

Qwen3.6 GGUFs: https://huggingface.co/unsloth/Qwen3.6-35B-A3B-GGUF

submitted by /u/danielhanchen
[link] [comments]