| Hi! Just checking, am I the only one who has serious issues with Gemma 4 locally? I've played around with Gemma 4 using Unsloth quants on llama.cpp, and it's seriously broken. I'm using the latest changes from llama.cpp, along with the reccomended temperature, top-p and top-k. Giving it an article and asking it to list all typos along with the corrected version gives total nonsense. Here is a random news article I tested it with: https://www.bbc.com/news/articles/ce843ge47z4o I've tried the 26B MoE, I've tried the 31B, and I've tried UD-Q8_K_XL, Q8_0, and UD-Q4_K_XL. They all have the same issue. As a control, I tested the same thing in Google AI Studio, and there the models work great, finding actual typos instead of the nonsense I get locally. [link] [comments] |
Gemma 4 is seriously broken when using Unsloth and llama.cpp
Reddit r/LocalLLaMA / 4/3/2026
💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage
Key Points
- A Reddit user reports that Gemma 4 generates nonsensical outputs when run locally using Unsloth quantizations on llama.cpp, failing a basic typo-correction test on a news article.
- The issue reportedly occurs across multiple Gemma 4 variants (26B/31B, including MoE) and several quantization formats (e.g., UD-Q8_K_XL, Q8_0, UD-Q4_K_XL), suggesting a broader compatibility or inference/quantization problem rather than a single checkpoint.
- The same typo-finding task works correctly in Google AI Studio, indicating the model itself may behave as expected in managed environments.
- The user is using the latest llama.cpp changes and standard sampling settings, implying the behavior may be specific to the local toolchain (Unsloth + llama.cpp) rather than prompting parameters.
- The post functions as an early warning signal for developers relying on local Gemma 4 deployments, pointing to the need for troubleshooting quantization/inference compatibility.




