Gemma 4 26B-A4B GGUF Benchmarks

Reddit r/LocalLLaMA / 4/20/2026

📰 NewsSignals & Early TrendsTools & Practical UsageModels & Research

共有:

Key Points

The post reports KL-Divergence benchmarks comparing Gemma 4 26B-A4B GGUF quantizations across providers to help users choose the best quantization.
Mean KL Divergence results indicate that nearly all Unsloth GGUFs lie on the Pareto frontier, suggesting strong fidelity to the original BF16 output distribution.
Unsloth is characterized as top-performing in 21 out of 22 sizes, with similarly strong trends observed at 99.9% KLD.
Unsloth updated several quant variants (e.g., Q6_K made more dynamic, and similar updates for Qwen3.6) and notes the newer versions may be slightly larger, though the prior ones remain usable without re-downloading.
A new UD-IQ4_NL_XL quant option (14.6GB) is introduced to fit within 16GB VRAM for Gemma 4 (and similarly for Qwen3.6), positioned between smaller and larger UD-IQ4 variants.

Hey r/LocalLLaMA we conducted KL Divergence benchmarks for Gemma 4 26B-A4B GGUFs across providers to help you pick the best quant.

Mean KL Divergence puts nearly all Unsloth GGUFs on the Pareto frontier
KLD shows how well a quantized model matches the original BF16 output distribution, indicating retained accuracy.
This makes Unsloth the top-performing in 21 of 22 sizes. Similar trend for 99.9% KLD and others.
We also updated our Q6_K quants to be more dynamic. Previously, they were optimized, just now they're a bit better - no need to re-download though - it's up to you if you want a slightly better version. The previous quant was perfectly fine but this one is slightly bigger. The same was done for Qwen3.6.
We're also introducing a new UD-IQ4_NL_XL quant that fits in 16GB VRAM. UD-IQ4_NL_XL (14.6GB) sits between UD-IQ4_XS (13.4GB) and UD-Q4_K_S (16.4GB). The same was done for Qwen3.6.

For HQ versions of the graphs as Reddit mobile compresses it. See: Gemma 4 Benchmarks and Qwen3.6 Benchmarks

We also updated our MLX quants to be more dynamic with better layering selection (there are limitations due to MLX): See here