DeepSeek 3.2 eating the opening think tag on llama.cpp server?

Reddit r/LocalLLaMA / 4/20/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical Usage

共有:

Key Points

The article reports a local LLM inference issue where DeepSeek V3.2 outputs reasoning text but omits the opening <think> tag in the token stream served via llama-server.
The user observes that only the closing tag appears at the end of the output, causing Open WebUI to fail to collapse the thought block.
They are running a GGUF model (noted as “Unsloth GGUF”) on a 512GB machine and using llama-server with options including high thread count and flash-attn.
Toggling reasoning on/off does not resolve the missing opening tag, and the user is unsure whether the chat template is broken for these GGUFs or whether a server/template flag is missing.

Hey guys. Having a weird issue with the new DeepSeek V3.2 Unsloth GGUF via llama-server. The model starts reasoning fine, but the actual opening think tag is missing from the output stream. I just see the plain text reasoning, and then the closing tag at the end.

Because of this, Open WebUI doesn't collapse the thought block. Im on a 512GB box, command is just llama-server -m model_name -t 32 --flash-attn on. Tried toggling reasoning on/off, didn't help.

Is the chat template broken in these specific GGUFs or am I missing a flag?

submitted by /u/Winter_Engineer2163
[link] [comments]