Just swapped Qwen 3.5 for the 3.6 variant (FP8, RTX 6000 Pro) using the same recommended generation settings. My stack is vLLM (v0.19.0) + Open WebUI (v0.8.12) in a RAG setup where the model has access to several document retrieval tools.
After some initial testing (single-turn, didnt try to disable interleaved reasoning yet), I’ve noticed some significant shifts:
- 3.6 is far more "talkative" with tools. Reasoning tokens have jumped from a few dozen to several hundred (a 2x-3x increase).
- It struggles to follow specific instructions compared to 3.5.
- It seems to ignore or weight the system prompt much less.
- Despite being prompted for exhaustive answers, the final responses are significantly shorter.
I suspect a potential issue with the chat template or how vLLM handles the new weights, even though the architecture is the same. Anyone else seeing similar problems?
EDIT:
- I swapped Qwen3.5-35B-A3B and Qwen3.6-35B-A3B, nothing else.
- What worked before do not work that well anymore.
- The extra reasoning is significant WITH TOOLS.
[link] [comments]

