Share your llama-server init strings for Gemma 4 models.

Reddit r/LocalLLaMA / 4/8/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical Usage

Key Points

  • A user on Reddit is trying to run Gemma 4 models via llama.cpp (local inference) and cannot find init strings that produce workable results.
  • Although newer llama.cpp versions load the models, the user reports persistent problems where outputs are “lobotomized” or generation is extremely slow (around 3 tokens/second on an RTX 6000 Pro).
  • The user is experimenting with specific Gemma 4 variants (including “heretic” versions) and wants to test image analysis, but current performance makes that impractical.
  • They ask the community to share working llama-server init strings/configurations, including relevant flags such as offloading, context size, and multimodal/image token settings.

Hi. I'm trying to use llama.cpp to give me workable Gemma 4 inference, but I'm not finding anything that works. I'm using the latest llama.cpp, but I've tested it now on three versions. I thought it might just require me waiting until llama.ccp caught up, and now the models load, where before they didn't at all, but the same issues persist. I've tried a few of the ver4 models, but the results are either lobotomized or extremely slow. I tried this one today :

llama-server.exe -m .\models\30B\gemma-4-26B-A4B-it-heretic.bf16.gguf --jinja -ngl 200 --ctx-size 262144 --host 0.0.0.0 --port 13210 --no-warmup --mmproj .\models\30B\gemma-4-26B-A4B-it-heretic-mmproj.f32.gguf --temp 0.6 --top-k 64 --top-p 0.95 --min-p 0.0 --image-min-tokens 256 --image-max-tokens 8192 --swa-full

... and it was generating at 3t/s. I have an RTX 6000 Pro, so there's obviously something wrong there. I'm specifically wanting to test out its image analysis, but with that speed, that's not going to happen. I want to use a heretic version, but I've tried different versions, and I get the same issues.

Does anyone have any working llama.cpp init strings that they can share?

submitted by /u/AlwaysLateToThaParty
[link] [comments]