Qwen3.6-27B KLDs - INTs and NVFPs

Reddit r/LocalLLaMA / 4/23/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical Usage

Key Points

  • The post shares initial guidance for selecting quantization settings (“KLDs”) for the Qwen3.6-27B model, emphasizing that the right choice depends heavily on the intended use case.
  • It highlights that the THoTD NVFP variant is larger because it uses an NVFP4A16 configuration versus NVFP4(A4), and suggests NVFP4(A4) may perform better under batching since it stays in 4-bit throughout.
  • It notes a significant size jump for Cyan when moving from INT4 to BF16-INT4, raising a trade-off question between mixed-precision accuracy gains and increased memory/context cost.
  • The author indicates they will add more data to the graph as additional variants become available, encouraging readers to pick the correct quant the first time.
Qwen3.6-27B KLDs - INTs and NVFPs

https://preview.redd.it/oe958ecy6twg1.png?width=1484&format=png&auto=webp&s=9649d1833be88ec140e2d4fb96b1a66b2bfe6522

Will do more, but here's a start, as you're chosing your models. Remember, USE-CASE is important:

  • Notice the larger size of THoTD NVFP versus the other. This is because THoTD is an NVFP4A16 versus NVFP4(A4).
    • NVFP4(A4) should stay in 4bit the whole time, so if you are doing batching, NVFP4(A4) may see better performance as batching occurs
  • Notice that huge size increase for Cyan from INT4 to BF16-INT4.
    • More food for thought. Mixed-precision is amazing, but takes more space. Is 0.02 accuracy worth losing 6GB of Context? Up to you to decide.

As more come online I will add more to the graph. The more you know, the right quant for you, you grab the first time!!

submitted by /u/Phaelon74
[link] [comments]