Previously it was throwing a 'Not Implemented' error due to Mamba layers. Going to test it now!
https://github.com/vllm-project/vllm/pull/39931
Edit: Works with Qwen 3.6, tested with 27B
Can be used with argument;
--kv-cache-dtype turboquant_4bit_nc Other available options;
- turboquant_k8v4
- turboquant_4bit_nc
- turboquant_k3v4_nc
- turboquant_3bit_nc
When running with --enable-chunked-prefill it complained about mamba align, you just need to have more batched tokens than the value that error gives. I used 4096 to fix. --max-num-batched-tokens 4096
[link] [comments]


