Gemma 4 thinking system prompt

Reddit r/LocalLLaMA / 4/8/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisTools & Practical Usage

Key Points

  • The post asks how to reliably enable/disable the “thinking” (reasoning) feature in Gemma 4 models using system prompts or chat templates.
  • The author reports inconsistent behavior when attempting to control reasoning via system prompts with the 26B model, sometimes following instructions to skip reasoning and sometimes not.
  • As a workaround, placing a special token like `<thought off>` in the user prompt before the actual content seems to work, but it’s considered impractical for API integrations.
  • The question invites others to share whether anyone has devised a robust method to toggle thinking on/off specifically for Gemma 4.

I like to be able to enable and disable thinking using a system prompt, so that I can control what which prompts generate thinking tokens rather than relying on the model to choose for me. It's one of the reasons I loved Qwen-30b-A3b.

I'm having trouble getting this same setup working for the gemma 4 models. Right now playing with the 26b. The model will sometimes respond to a system prompt asking it to skip reasoning, sometimes not. If I put `<thought off>` in the user prompt before my own content, that seems to work well. However that isn't really practical for api calls and the like.

I'm curious if anyone has been able to devise a way to toggle thinking on/off using system prompts and/or chat templates with the gemma4 models?

submitted by /u/No_Information9314
[link] [comments]