More Gemma4 fixes in the past 24 hours

Reddit r/LocalLLaMA / 4/11/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Read original →

共有:

Key Points

The article notes recent fixes for Google’s Gemma 4 models, including a merged “reasoning budget fix” in the llama.cpp repository.
Google has provided new Jinja chat templates for multiple Gemma 4 variants (31B, 27B, E4B, and E2B) specifically aimed at improving tool-calling behavior.
It recommends using the newly updated templates (and, if not already updated, using the corresponding updated GGUF/model artifacts) rather than relying on older defaults.
The post shows how to apply a specific template in llama.cpp via the `--chat-template-file` command-line argument and provides an example configuration for running a 26B model with limited VRAM.
It includes an example llama.cpp/llama-server parameter set, including enabling “thinking” for certain model modes and setting a `reasoning_budget` value.

Reasoning budget fix (merged): https://github.com/ggml-org/llama.cpp/pull/21697

New chat templates from Google to fix tool calling:

31B: https://huggingface.co/google/gemma-4-31B-it/blob/main/chat_template.jinja

27B: https://huggingface.co/google/gemma-4-26B-A4B-it/blob/main/chat_template.jinja

E4B: https://huggingface.co/google/gemma-4-E4B-it/blob/main/chat_template.jinja

E2B: https://huggingface.co/google/gemma-4-E2B-it/blob/main/chat_template.jinja

Please correct me if Im wrong, but you should use these new templates unless you redownload a new GGUF, that has been updated in the past 24 hours with the new template.

You can use specific templates in llama.cpp by the command argument:

--chat-template-file /models/gemma4/gemma4_chat_template_26B.jinja

My current llama-swap/llama.cpp config 26B example (testing on 16GB VRAM , so context window is limited):

"Gemma4-26B-IQ4_XS": ttl: 300 # Automatically unloads after 5 mins of inactivity cmd: > /usr/local/bin/llama-server --port ${PORT} --host 127.0.0.1 --model /models/gemma4/gemma-4-26B-A4B-it-UD-IQ4_XS.gguf --mmproj /models/gemma4/gemma-4-26B-A4B-it.mmproj-q8_0.gguf --chat-template-file /models/gemma4/gemma4_chat_template_26B_09APR2026.jinja --cache-type-k q8_0 --cache-type-v q8_0 --n-gpu-layers 99 --parallel 1 --batch-size 2048 --ubatch-size 512 --ctx-size 16384 --image-min-tokens 300 --image-max-tokens 512 --flash-attn on --jinja --cache-ram 2048 -ctxcp 2 filters: stripParams: "temperature, top_p, top_k, min_p, presence_penalty, repeat_penalty" setParamsByID: "${MODEL_ID}:thinking": chat_template_kwargs: enable_thinking: true reasoning_budget: 4096 temperature: 1.0 top_p: 0.95 top_k: 64 min_p: 0.0 presence_penalty: 0.0 repeat_penalty: 1.0 "${MODEL_ID}:thinking-coding": chat_template_kwargs: enable_thinking: true reasoning_budget: 4096 temperature: 1.5 top_p: 0.95 top_k: 65 min_p: 0.0 presence_penalty: 0.0 repeat_penalty: 1.0 "${MODEL_ID}:instruct": chat_template_kwargs: enable_thinking: false temperature: 1.0 top_p: 0.95 top_k: 64 min_p: 0.0 presence_penalty: 0.0 repeat_penalty: 1.0"

submitted by /u/andy2na
[link] [comments]

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 4/11DailyView insight →

Black Hat USA

AI Business

Black Hat Asia

AI Business

GitHub Copilot Testing for .NET: AI-Powered Unit Testing in Visual Studio 2026

Dev.to

Why Your pip Install Output Doesn't Belong in Claude's Context

Dev.to

I Logged Every Decision My AI Agent Made for a Week. Here's What I Learned.

Dev.to

More Gemma4 fixes in the past 24 hours

Key Points

💡 Insights using this article

Related Articles

Black Hat USA

Black Hat Asia

GitHub Copilot Testing for .NET: AI-Powered Unit Testing in Visual Studio 2026

Why Your pip Install Output Doesn't Belong in Claude's Context

I Logged Every Decision My AI Agent Made for a Week. Here's What I Learned.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer