AI Navigate

Qwen3.5-27b 8 bit vs 16 bit

Reddit r/LocalLLaMA / 3/17/2026

💬 OpinionSignals & Early TrendsTools & Practical Usage

Key Points

  • The author compared Qwen3.5-27B with vLLM using the original bf16 version versus Qwen's -fp8 quantization, including an 8-bit KV cache versus the original 16-bit cache.
  • Results were practically identical, with any small differences attributed to random noise since each run was performed only once.
  • The test used the Aider benchmark on an RTX 6000 Pro.
  • The conclusion is that fp8 should be used for both weights and cache to dramatically increase the amount of context available.
Qwen3.5-27b 8 bit vs 16 bit

I tested Qwen3.5 27B with vLLM using the original bf16 version vs the Qwen made -fp8 quantization and using 8 bit KV cache vs the original 16 bit cache. I got practically identical results. I attribute the small difference to random noise as I only ran each once.

The test was done using the Aider benchmark on a RTX 6000 Pro.

My conclusion is that one should be using fp8 for both weights and cache. This will dramatically increase the amount of context available.

submitted by /u/Baldur-Norddahl
[link] [comments]