AI Navigate

Implementing reasoning-budget in Qwen3.5

Reddit r/LocalLLaMA / 3/20/2026

💬 OpinionTools & Practical UsageModels & Research

Key Points

  • The post asks how to implement reasoning-budget for Qwen3.5 using vLLM or SGLang in Python.
  • The author reports the model consistently uses about 1500 tokens for reasoning regardless of attempts to adjust it.
  • The question was submitted by user /u/DingyAtoll on Reddit, with a link to the LocalLLaMA discussion thread.
  • The thread focuses on understanding and controlling the reasoning budget, which impacts latency, cost, and output behavior.

Can anyone please tell me how I am supposed to implement reasoning-budget for Qwen3.5 on either vLLM or SGLang on Python? No matter what I try it just thinks for 1500 tokens for no reason and it's driving me insane.

submitted by /u/DingyAtoll
[link] [comments]