PSA: Having issues with Qwen3.5 overthinking? Give it a tool, and it can help dramatically.

Reddit r/LocalLLaMA / 4/14/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical Usage

共有:

Key Points

The post addresses a common issue with Qwen 3.5 “overthinking” and proposes practical configuration changes to reduce it.
It recommends verifying sampling parameters—especially the presence_penalty—by setting presence_penalty to roughly 1.0–1.5 (with some experimentation).
The key workaround is enabling tools/function calling: when tools are available, Qwen 3.5 shifts from a long “reasoning trace” to a shorter, more natural response style.
The author reports testing via llama-server in Open-WebUI (ensuring “native” function calling is enabled) and notes that other tool-enabled harnesses should already avoid the problem.
The TL;DR is to enable tools (even if not actually used) and follow the recommended sampling guidance to mitigate overthinking.

I'm sure everyone has seen the posts from people talking about Qwen 3.5 over-thinking, or maybe you've experienced it yourself. Considering we're like 2 months out from the release and I still see people talk about this issue, I decided it might be a good idea to put this thread out there.

First, the obvious - make sure your sampling parameters are set correctly. This is the first part of the "fix" and relates to the presence_penalty value. Set this to 1.0-1.5. Experiment a little if you're willing. This is something most of you here likely already know, too. So let's get to the "real" fix.

When Qwen 3.5 has no tools available, it engages in a Gemini 3/Gemma 4-like reasoning trace. This is the nice, bullet list style as seen here.

This is relevant because when you enable tools for 3.5, it completely changes the style of reasoning and instead it engages in a short, more natural Claude-like trace as shown here. If you've used Claude, you probably immediately recognise this style. For context, this is with the model running via llama-server inside Open-WebUI. All I did was enable the built-in tools it comes with.(Note if using OWI: make sure you enable "native" function calling.) This isn't only applicable to OWI, though. If using a harness that already has tools like OpenCode or Hermes Agent, you shouldn't have any overthinking problems in the first place.

But yeah, that's essentially all there is to it. So, if you're running the model with no tools, I'd strongly recommend adding some. Apparently even just telling it that it has fake tools works too, but I haven't tried this myself.

I hope this helps anybody who has been dealing with this. :)

TL;DR: Enable a tool even if you aren't using it, and make sure you've got your sampling params set according to Unsloths guide.

submitted by /u/ayylmaonade
[link] [comments]

Black Hat USA

AI Business

Black Hat Asia

AI Business

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Microsoft MAI-Image-2-Efficient Review 2026: The AI Image Model Built for Production Scale

Dev.to

Bit of a strange question?

Reddit r/artificial

PSA: Having issues with Qwen3.5 overthinking? Give it a tool, and it can help dramatically.

Key Points

Related Articles

Black Hat USA

Black Hat Asia

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Microsoft MAI-Image-2-Efficient Review 2026: The AI Image Model Built for Production Scale

Bit of a strange question?

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer