Qwen3.5 27B refuses to stop thinking

Reddit r/LocalLLaMA / 3/15/2026

💬 OpinionTools & Practical Usage

共有:

Key Points

A Reddit post reports that Qwen3.5 27B cannot be forced to stop thinking using --chat-template-kwargs '{"enable_thinking": false}' or --reasoning off in llama-server, unlike other Qwen and Nemotron models.
The model apparently continues its internal thinking without inserting a <think> tag, but finishes with </think>, indicating suppression methods may not apply to this variant.
The issue appears specific to Qwen3.5 27B and is discussed in a r/LocalLLaMA thread by user /u/liftheavyscheisse, referencing llama.cpp commit b8295.
The author asks if others have encountered the problem or know a workaround, suggesting a potential bug or model-specific behavior.

I've tried --chat-template-kwargs '{"enable_thinking": false}' and its successor --reasoning off in llama-server, and although it works for other models (I've tried successfully on several Qwen and Nemotron models), it doesn't work for the Qwen3.5 27B model.

It just thinks anyway (without inserting a <think> tag, but it finishes its thinking with </think>).

Anybody else have this problem / know how to solve it?

llama.cpp b8295

submitted by /u/liftheavyscheisse
[link] [comments]