Share your llama-server init strings for Gemma 4 models.

Reddit r/LocalLLaMA / 4/8/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical Usage

共有:

Key Points

A user on Reddit is trying to run Gemma 4 models via llama.cpp (local inference) and cannot find init strings that produce workable results.
Although newer llama.cpp versions load the models, the user reports persistent problems where outputs are “lobotomized” or generation is extremely slow (around 3 tokens/second on an RTX 6000 Pro).
The user is experimenting with specific Gemma 4 variants (including “heretic” versions) and wants to test image analysis, but current performance makes that impractical.
They ask the community to share working llama-server init strings/configurations, including relevant flags such as offloading, context size, and multimodal/image token settings.

Hi. I'm trying to use llama.cpp to give me workable Gemma 4 inference, but I'm not finding anything that works. I'm using the latest llama.cpp, but I've tested it now on three versions. I thought it might just require me waiting until llama.ccp caught up, and now the models load, where before they didn't at all, but the same issues persist. I've tried a few of the ver4 models, but the results are either lobotomized or extremely slow. I tried this one today :

llama-server.exe -m .\models\30B\gemma-4-26B-A4B-it-heretic.bf16.gguf --jinja -ngl 200 --ctx-size 262144 --host 0.0.0.0 --port 13210 --no-warmup --mmproj .\models\30B\gemma-4-26B-A4B-it-heretic-mmproj.f32.gguf --temp 0.6 --top-k 64 --top-p 0.95 --min-p 0.0 --image-min-tokens 256 --image-max-tokens 8192 --swa-full

... and it was generating at 3t/s. I have an RTX 6000 Pro, so there's obviously something wrong there. I'm specifically wanting to test out its image analysis, but with that speed, that's not going to happen. I want to use a heretic version, but I've tried different versions, and I get the same issues.

Does anyone have any working llama.cpp init strings that they can share?

submitted by /u/AlwaysLateToThaParty
[link] [comments]