New Bartowski Gemma 4 quants are a lot slower?

Reddit r/LocalLLaMA / 4/11/2026

💬 OpinionSignals & Early TrendsTools & Practical Usage

Key Points

  • Bartowski released new quantized (quants) versions for Gemma 4, and at least one user reports significantly slower throughput after updating to the new files.
  • The reported performance drop is roughly to half of token generation speed (tg/s) and to about 75% of prompt processing speed (pp/s) on models like 26B and E4B.
  • The discussion speculates that the model weights may be unchanged, but that changes to the GGUF header or enabled llama.cpp features could be causing the slowdown on the user’s hardware.
  • Commenters are asked to identify what changed between the original and new quant releases and whether specific runtime/compiler features or quantization settings are responsible.

Bartowski has uploaded new quants for Gemma 4. I've downloaded them for 26B and E4B.

Compared to his original release I'm getting about half the tg/s for both of them. 75% of the pp/s.

Does anyone know what changed? I'm assuming the weights aren't the problem but maybe the gguf header now enables a llama.cpp feature that my hardware dislikes?

Thanks for any information!

submitted by /u/Top-Rub-4670
[link] [comments]