New Bartowski Gemma 4 quants are a lot slower?

Reddit r/LocalLLaMA / 4/11/2026

💬 OpinionSignals & Early TrendsTools & Practical Usage

共有:

Key Points

Bartowski released new quantized (quants) versions for Gemma 4, and at least one user reports significantly slower throughput after updating to the new files.
The reported performance drop is roughly to half of token generation speed (tg/s) and to about 75% of prompt processing speed (pp/s) on models like 26B and E4B.
The discussion speculates that the model weights may be unchanged, but that changes to the GGUF header or enabled llama.cpp features could be causing the slowdown on the user’s hardware.
Commenters are asked to identify what changed between the original and new quant releases and whether specific runtime/compiler features or quantization settings are responsible.

Bartowski has uploaded new quants for Gemma 4. I've downloaded them for 26B and E4B.

Compared to his original release I'm getting about half the tg/s for both of them. 75% of the pp/s.

Does anyone know what changed? I'm assuming the weights aren't the problem but maybe the gguf header now enables a llama.cpp feature that my hardware dislikes?

Thanks for any information!

submitted by /u/Top-Rub-4670
[link] [comments]

Black Hat USA

AI Business

Black Hat Asia

AI Business

GitHub Copilot Testing for .NET: AI-Powered Unit Testing in Visual Studio 2026

Dev.to

Why Your pip Install Output Doesn't Belong in Claude's Context

Dev.to

I Logged Every Decision My AI Agent Made for a Week. Here's What I Learned.

Dev.to

New Bartowski Gemma 4 quants are a lot slower?

Key Points

Related Articles

Black Hat USA

Black Hat Asia

GitHub Copilot Testing for .NET: AI-Powered Unit Testing in Visual Studio 2026

Why Your pip Install Output Doesn't Belong in Claude's Context

I Logged Every Decision My AI Agent Made for a Week. Here's What I Learned.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer