Qwen3.6-27B KLDs - INTs and NVFPs

Reddit r/LocalLLaMA / 4/23/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical Usage

共有:

Key Points

The post shares initial guidance for selecting quantization settings (“KLDs”) for the Qwen3.6-27B model, emphasizing that the right choice depends heavily on the intended use case.
It highlights that the THoTD NVFP variant is larger because it uses an NVFP4A16 configuration versus NVFP4(A4), and suggests NVFP4(A4) may perform better under batching since it stays in 4-bit throughout.
It notes a significant size jump for Cyan when moving from INT4 to BF16-INT4, raising a trade-off question between mixed-precision accuracy gains and increased memory/context cost.
The author indicates they will add more data to the graph as additional variants become available, encouraging readers to pick the correct quant the first time.

Will do more, but here's a start, as you're chosing your models. Remember, USE-CASE is important:

Notice the larger size of THoTD NVFP versus the other. This is because THoTD is an NVFP4A16 versus NVFP4(A4).
- NVFP4(A4) should stay in 4bit the whole time, so if you are doing batching, NVFP4(A4) may see better performance as batching occurs
Notice that huge size increase for Cyan from INT4 to BF16-INT4.
- More food for thought. Mixed-precision is amazing, but takes more space. Is 0.02 accuracy worth losing 6GB of Context? Up to you to decide.

As more come online I will add more to the graph. The more you know, the right quant for you, you grab the first time!!