gonna delete this as soon as it's merged, just couldn't contain my excitement. LOOK AT THAT BENCHIE:
Qwen3.5-35B-A3B (master) fully in VRAM:
| KV quant | mean KLD | 99% KLD | same top p |
| q8_0 | 0.003778 ± 0.000058 | 0.035869 | 97.303 ± 0.042 |
| q4_0 | 0.010338 ± 0.000085 | 0.078723 | 95.331 ± 0.055 |
| type_k | type_v | test | t/s |
| bf16 | bf16 | pp512 | 5263.78 ± 23.30 |
| bf16 | bf16 | tg128 | 173.58 ± 0.46 |
| q8_0 | q8_0 | pp512 | 5210.77 ± 124.88 |
| q8_0 | q8_0 | tg128 | 172.11 ± 0.50 |
| q4_0 | q4_0 | pp512 | 5263.64 ± 15.16 |
| q4_0 | q4_0 | tg128 | 171.63 ± 0.66 |
Qwen3.5-35B-A3B (attn-rot) fully in VRAM:
| KV quant | mean KLD | 99% KLD | same top p |
| q8_0 | 0.003702 ± 0.000039 | 0.035608 | 97.355 ± 0.042 |
| q4_0 | 0.007657 ± 0.000085 | 0.062180 | 96.070 ± 0.051 |
| type_k | type_v | test | t/s |
| bf16 | bf16 | pp512 | 5270.17 ± 25.16 |
| bf16 | bf16 | tg128 | 173.47 ± 0.19 |
| q8_0 | q8_0 | pp512 | 5231.55 ± 29.73 |
| q8_0 | q8_0 | tg128 | 167.07 ± 0.75 |
| q4_0 | q4_0 | pp512 | 5245.99 ± 21.93 |
| q4_0 | q4_0 | tg128 | 166.47 ± 0.72 |
Qwen3.5-27B (master) fully in VRAM:
| KV quant | mean KLD | 99% KLD | same top p |
| q8_0 | 0.001178 ± 0.000157 | 0.004762 | 98.987 ± 0.026 |
| q4_0 | 0.007168 ± 0.000310 | 0.041270 | 97.021 ± 0.044 |
| type_k | type_v | test | t/s |
| bf16 | bf16 | pp512 | 2152.75 ± 32.84 |
| bf16 | bf16 | tg128 | 42.84 ± 0.01 |
| q8_0 | q8_0 | pp512 | 2153.43 ± 32.27 |
| q8_0 | q8_0 | tg128 | 42.74 ± 0.01 |
| q4_0 | q4_0 | pp512 | 2152.57 ± 28.21 |
| q4_0 | q4_0 | tg128 | 42.66 ± 0.02 |
Qwen3.5-27B (attn-rot) fully in VRAM:
| KV quant | mean KLD | 99% KLD | same top p |
| q8_0 | 0.001105 ± 0.000126 | 0.004725 | 98.966 ± 0.026 |
| q4_0 | 0.005305 ± 0.000304 | 0.029281 | 97.604 ± 0.040 |
| type_k | type_v | test | t/s |
| bf16 | bf16 | pp512 | 2150.84 ± 31.88 |
| bf16 | bf16 | tg128 | 42.85 ± 0.02 |
| q8_0 | q8_0 | pp512 | 2141.86 ± 36.03 |
| q8_0 | q8_0 | tg128 | 42.27 ± 0.03 |
| q4_0 | q4_0 | pp512 | 2138.60 ± 31.63 |
| q4_0 | q4_0 | tg128 | 42.20 ± 0.02 |
Qwen3.5-122B-A10B (master) n-cpu-mode=27:
| KV quant | mean KLD | 99% KLD | same top p |
| q8_0 | 0.003275 ± 0.000027 | 0.039921 | 97.844 ± 0.038 |
| q4_0 | 0.008272 ± 0.000065 | 0.081220 | 96.281 ± 0.049 |
| type_k | type_v | test | t/s |
| bf16 | bf16 | pp512 | 193.94 ± 54.32 |
| bf16 | bf16 | tg128 | 27.17 ± 0.21 |
| q8_0 | q8_0 | pp512 | 191.27 ± 56.92 |
| q8_0 | q8_0 | tg128 | 27.27 ± 0.11 |
| q4_0 | q4_0 | pp512 | 194.80 ± 55.64 |
| q4_0 | q4_0 | tg128 | 27.22 ± 0.03 |
Qwen3.5-122B-A10B (attn-rot) n-cpu-mode=27:
| KV quant | mean KLD | 99% KLD | same top p |
| q8_0 | 0.003285 ± 0.000027 | 0.039585 | 97.824 ± 0.038 |
| q4_0 | 0.006311 ± 0.000045 | 0.064831 | 96.895 ± 0.045 |
| type_k | type_v | test | t/s |
| bf16 | bf16 | pp512 | 194.84 ± 56.23 |
| bf16 | bf16 | tg128 | 27.30 ± 0.17 |
| q8_0 | q8_0 | pp512 | 194.10 ± 55.76 |
| q8_0 | q8_0 | tg128 | 27.00 ± 0.10 |
| q4_0 | q4_0 | pp512 | 194.87 ± 56.16 |
| q4_0 | q4_0 | tg128 | 27.21 ± 0.06 |
submitted by