Qwen3.6-27B at ~80 tps with 218k context window on 1x RTX 5090 served by vllm 0.19

Reddit r/LocalLLaMA / 4/25/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical UsageModels & Research

共有:

Key Points

Qwen3.6-27B is being shared publicly on Hugging Face, including an NVFP4 variant using MTP that was released a few days ago.
A creator reports achieving around 80 tokens/second on a single RTX 5090 while using a 218k context window by following the same approach used for Qwen3.5-27B.
The performance is claimed to be enabled by the latest vLLM 0.19 builds, specifically vLLM 0.19.1rc1.
The post points to prior community discussion and a related Qwen3.5-27B RTX 5090/vLLM performance report as supporting reference material.

Qwen3.6-27B is out for a few days and the NVFP4 with MTP is dropped earlier on HF: https://huggingface.co/sakamakismile/Qwen3.6-27B-Text-NVFP4-MTP

Can follow the same recipe I used for Qwen3.5-27B to achieve ~80 tps on a single RTX 5090 at 218k context window via latest vllm 0.19 builds (vLLM 0.19.1rc1)

https://www.reddit.com/r/LocalLLaMA/comments/1sr8gyf/qwen3527b_on_rtx_5090_served_via_vllm_77_tps/

submitted by /u/Kindly-Cantaloupe978
[link] [comments]