Gemma 4 is seriously broken when using Unsloth and llama.cpp

Reddit r/LocalLLaMA / 4/3/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage

共有:

Key Points

A Reddit user reports that Gemma 4 generates nonsensical outputs when run locally using Unsloth quantizations on llama.cpp, failing a basic typo-correction test on a news article.
The issue reportedly occurs across multiple Gemma 4 variants (26B/31B, including MoE) and several quantization formats (e.g., UD-Q8_K_XL, Q8_0, UD-Q4_K_XL), suggesting a broader compatibility or inference/quantization problem rather than a single checkpoint.
The same typo-finding task works correctly in Google AI Studio, indicating the model itself may behave as expected in managed environments.
The user is using the latest llama.cpp changes and standard sampling settings, implying the behavior may be specific to the local toolchain (Unsloth + llama.cpp) rather than prompting parameters.
The post functions as an early warning signal for developers relying on local Gemma 4 deployments, pointing to the need for troubleshooting quantization/inference compatibility.

Gemma 4 is seriously broken when using Unsloth and llama.cpp

Hi! Just checking, am I the only one who has serious issues with Gemma 4 locally?

I've played around with Gemma 4 using Unsloth quants on llama.cpp, and it's seriously broken. I'm using the latest changes from llama.cpp, along with the reccomended temperature, top-p and top-k.

Giving it an article and asking it to list all typos along with the corrected version gives total nonsense. Here is a random news article I tested it with: https://www.bbc.com/news/articles/ce843ge47z4o

I've tried the 26B MoE, I've tried the 31B, and I've tried UD-Q8_K_XL, Q8_0, and UD-Q4_K_XL. They all have the same issue.

As a control, I tested the same thing in Google AI Studio, and there the models work great, finding actual typos instead of the nonsense I get locally.

submitted by /u/Tastetrykker
[link] [comments]