I’ve been testing Qwen3.6 27B on a pretty non-standard local setup and figured the numbers might be useful for anyone looking at the newer 16GB Blackwell cards.
Hardware:
- 2x RTX 5060 Ti 16GB
- 32GB total VRAM
- Proxmox LXC
- 16 vCPU
- ~60GB RAM
- CUDA 13 / Torch 2.11 nightly
- vLLM nightly:
0.19.2rc1.dev - Model:
sakamakismile/Qwen3.6-27B-Text-NVFP4-MTP
vLLM launch shape:
vllm serve sakamakismile/Qwen3.6-27B-Text-NVFP4-MTP \ --served-model-name qwen36-nvfp4-mtp \ --tensor-parallel-size 2 \ --max-model-len 204800 \ --max-num-batched-tokens 8192 \ --max-num-seqs 1 \ --gpu-memory-utilization 0.95 \ --kv-cache-dtype fp8 \ --quantization modelopt \ --speculative-config '{"method":"mtp","num_speculative_tokens":3}' \ --reasoning-parser qwen3 \ --language-model-only \ --generation-config vllm \ --disable-custom-all-reduce \ --attention-backend TRITON_ATTN Performance so far:
- 8K context, MTP n=1: ~50–52 tok/s
- 8K context, MTP n=3: ~62–66 tok/s
- 32K context: ~59–66 tok/s
- 204800 context starts and works, but is tight
- Idle VRAM at 204k: ~14.45GiB per GPU
- After a 168k-token prefill: ~15.65GiB per GPU
- 168k-token needle/retrieval smoke test passed in ~256s
- Near-limit test correctly rejected prompt+output over the 204800 window
Thinking mode works too, but you need to give it enough output budget. With low max_tokens, Qwen can spend the whole cap on reasoning and return no final content. Around 1024+ is fine for small prompts, and 4096–8192 is safer for actual reasoning tasks.
Caveats:
- 204k context is right on the edge with 2x16GB.
gpu_memory_utilization=0.94failed KV allocation;0.95worked.- Startup takes several minutes due to compile/autotune.
- Logs show FlashInfer autotuner OOM fallbacks during startup, but the server still becomes healthy.
- I had better luck with
TRITON_ATTNfor the text path. - This is not a high-concurrency config:
max_num_seqs=1.
Overall: dual 5060 Ti 16GB seems surprisingly usable for Qwen3.6 27B if you use the right checkpoint/runtime combo. It’s not roomy, but it works.
[link] [comments]



