| I tested Qwen3.5 27B with vLLM using the original bf16 version vs the Qwen made -fp8 quantization and using 8 bit KV cache vs the original 16 bit cache. I got practically identical results. I attribute the small difference to random noise as I only ran each once. The test was done using the Aider benchmark on a RTX 6000 Pro. My conclusion is that one should be using fp8 for both weights and cache. This will dramatically increase the amount of context available. [link] [comments] |
Qwen3.5-27b 8 bit vs 16 bit
Reddit r/LocalLLaMA / 3/17/2026
💬 OpinionSignals & Early TrendsTools & Practical Usage
Key Points
- The author compared Qwen3.5-27B with vLLM using the original bf16 version versus Qwen's -fp8 quantization, including an 8-bit KV cache versus the original 16-bit cache.
- Results were practically identical, with any small differences attributed to random noise since each run was performed only once.
- The test used the Aider benchmark on an RTX 6000 Pro.
- The conclusion is that fp8 should be used for both weights and cache to dramatically increase the amount of context available.




