Qwen 3.6 27B in Claude Code says it will do something then stops and prompts for user reply (not failing a tool call)

Reddit r/LocalLLaMA / 4/27/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage

共有:

Key Points

A user running Qwen/Qwen3.6-27B-FP8 on vLLM reports that, in Claude Code, the model often announces it will do something and then stops to wait for user input.
The user observes cases where the system produces no error and no failed tool-call signal, suggesting the tool execution is not explicitly failing, even though the expected follow-through doesn’t happen.
Sometimes the model repeats the behavior multiple times and even claims it detected a user intent like “continue,” yet still waits for another user reply.
The poster is asking whether this behavior stems from a model limitation, a mismatch between Claude Code prompting/tooling and the model (tool-call parsing), or a configuration issue in vLLM.
They note the same issue is less common in OpenCode and ask for diagnosis or guidance on making Claude Code work reliably with this model setup.

I'm running Qwen/Qwen3.6-27B-FP8 via vLLM using this command: vllm serve Qwen/Qwen3.6-27B-FP8 --tensor-parallel-size 4 --gpu-memory-utilization 0.95 --max-num-seqs 8 \ --enable-auto-tool-choice --tool-call-parser qwen3_xml \ --enable-prefix-caching --attention-backend flashinfer

It works pretty well in Claude Code, except fairly often it will announce its about to do something, then just stops and waits for a user response. E.g.:

``` Let me continue with the remaining edits.

✻ Brewed for 48s

```

(waiting for user input)

No error message, no failed tool call as far as I can tell, it just fails to follow through. Sometimes it will do it several times in a row and even comment "The user replied 'continue' - they want me to continue. Let me continue with the remaining edits." (user prompt waiting for me to reply)

Is this just a deficiency in the model's thinking, an incompatibility between Claude Code's prompts and the model, or an error in the configuration?

I haven't seen this happen in OpenCode, but there are reasons I prefer CC for some tasks.

Thanks.

submitted by /u/jettoblack
[link] [comments]