Qwen3.5-35B GGUF quants (16–22 GiB) - KLD + speed comparison
I'm back with some more benchmarks. I benchmarked the KLD divergence of the actual Qwen3.5-35B-A3B GGUF quantizations (16–22 GiB) available on Hugging Face.
KLD: The Kullback-Leibler divergence which shows how similar the FP16 and the quantized logit distributions are by measuring the difference in probability distributions between the quantized model and the FP16 baseline on a reference corpus.
u/TitwitMuffbiscuit had a shot at this some time ago but unfortunately all the models got updated a short period after he published his measurements.
For this research I also decided not to use the Wikitext-2 test dataset, which is in English, and instead took the multilingual FLORES 200 dataset out of which I extracted 700 KB of lines across randomly chosen languages. Additionally, I found another interesting dataset calibration_data_v5_rc.txt with about 400KB in size that contains a lot of interesting topics such as programming, math, syntax examples, technical text, etc. I combined both datasets into a mixed dataset to create the KLD baseline and measured the KLD distance for all the models that I found with this baseline.
I prepared two tables, where one is sorted by the classical "KLD mean" value and one that's sorted by the "KLD 99%" value, similar to the plots that Unsloth published on their latest blogpost about the Qwen models.
I'm not going to try to declare a winner here, that's up to you, given your very specific constraints as a GPU-Poor. To make it a little easier to visualize the models that are punching above their weight, i simply compare the numbers of the actual model to the model below and visualize them in bold letters if they are lower or higher based on the chosen metric.
The PP/s (prompt-processing) and TG/s (token-generation) columns are very specific numbers that will probably be meaningless to most users. You are going to need a Intel CPU, a RTX 3090 GPU (Ampere) and use Linux with Cuda Driver Version 580.126.18 to make use of those numbers. I used llama-bench with a context length of 10k to obtain these numbers.
Looking at the TG/s speed, for example, we can see that UD-Q3_K_XL from Unsloth before their last update was the slowest with a generation speed of ~105 t/s and the fastest is Mungert's iq4_nl with ~143 t/s which makes a total variation of 36.2% in the token generation speed for my specific architecture, which is shockingly high and one of the reasons why it is a little bit hard to define a so-called best model.
Notes: The cmp-nct prefixed models in the tables are actually a mirror from the older Unsloth quants that I found before their latest upload, which I also wanted to measure.
Sorted by KLD mean
| Model | KLD mean | GiB | PP/s | TG/s |
|---|---|---|---|---|
| unsloth_UD-Q4_K_XL | 0.016158 | 20.70 | 2812.949429 | 122.616934 |
| AesSedai_Q4_K_M | 0.016308 | 20.62 | 2966.807082 | 123.676699 |
| unsloth_Q4_K_M | 0.016708 | 20.49 | 2821.819502 | 123.910904 |
| bartowski_Q4_K_L | 0.020222 | 20.27 | 2809.591483 | 130.155778 |
| unsloth_Q4_K_S | 0.020469 | 19.24 | 2838.399411 | 124.346442 |
| bartowski_Q4_K_M | 0.022723 | 19.92 | 2806.437093 | 131.632558 |
| cmp-nct_UD-Q4_K_XL | 0.022863 | 19.16 | 2861.949731 | 125.816493 |
| ubergarm_Q4_0 | 0.024576 | 19.78 | 2876.503157 | 124.357224 |
| unsloth_UD-Q4_K_L | 0.024691 | 18.81 | 2861.777605 | 131.242261 |
| bartowski_Q4_K_S | 0.025161 | 19.19 | 2849.248198 | 134.693183 |
| Mungert_q4_k_m | 0.026718 | 20.08 | 2812.234371 | 137.328114 |
| cmp-nct_UD-Q4_K_M | 0.030445 | 18.48 | 2840.653679 | 136.462817 |
| bartowski_Q4_1 | 0.030681 | 20.45 | 2831.282134 | 136.927623 |
| bartowski_IQ4_NL | 0.032332 | 18.50 | 2981.250713 | 137.735717 |
| bartowski_IQ4_XS | 0.032829 | 17.52 | 3017.103823 | 135.980487 |
| AesSedai_IQ4_XS | 0.037086 | 16.40 | 3016.284929 | 120.057024 |
| unsloth_UD-IQ4_NL | 0.037691 | 16.59 | 2850.872626 | 123.322993 |
| unsloth_UD-IQ4_XS | 0.037835 | 16.28 | 2855.705903 | 121.589312 |
| bartowski_Q4_0 | 0.040627 | 18.80 | 2921.368478 | 137.152109 |
| Mungert_iq4_nl | 0.040920 | 18.36 | 2996.884610 | 140.422106 |
| Mungert_iq4_xs | 0.042396 | 17.37 | 3042.389900 | 139.850819 |
| Mungert_q4_1 | 0.045873 | 20.26 | 2833.595098 | 143.116543 |
| cmp-nct_UD-Q3_K_XL | 0.048064 | 16.05 | 2739.799015 | 105.006853 |
| Mungert_iq3_m | 0.049971 | 16.58 | 2871.107320 | 138.612701 |
| Mungert_iq3_s | 0.049971 | 16.58 | 2874.769301 | 139.805846 |
| bartowski_Q3_K_XL | 0.061445 | 16.13 | 2660.731996 | 123.457777 |
| Mungert_q3_k_m | 0.061488 | 16.29 | 2710.267499 | 131.202303 |
| Mungert_q4_0 | 0.084376 | 18.24 | 2956.897238 | 143.063168 |
Sorted by KLD 99%
| Model | KLD 99% | GiB | PP/s | TG/s |
|---|---|---|---|---|
| unsloth_UD-Q4_K_XL | 0.145385 | 20.70 | 2812.949429 | 122.616934 |
| AesSedai_Q4_K_M | 0.147057 | 20.62 | 2966.807082 | 123.676699 |
| unsloth_Q4_K_M | 0.147594 | 20.49 | 2821.819502 | 123.910904 |
| unsloth_Q4_K_S | 0.177634 | 19.24 | 2838.399411 | 124.346442 |
| bartowski_Q4_K_L | 0.179187 | 20.27 | 2809.591483 | 130.155778 |
| cmp-nct_UD-Q4_K_XL | 0.191735 | 19.16 | 2861.949731 | 125.816493 |
| bartowski_Q4_K_M | 0.205318 | 19.92 | 2806.437093 | 131.632558 |
| unsloth_UD-Q4_K_L | 0.208308 | 18.81 | 2861.777605 | 131.242261 |
| ubergarm_Q4_0 | 0.222435 | 19.78 | 2876.503157 | 124.357224 |
| bartowski_Q4_K_S | 0.227099 | 19.19 | 2849.248198 | 134.693183 |
| Mungert_q4_k_m | 0.235314 | 20.08 | 2812.234371 | 137.328114 |
| cmp-nct_UD-Q4_K_M | 0.252636 | 18.48 | 2840.653679 | 136.462817 |
| bartowski_Q4_1 | 0.264378 | 20.45 | 2831.282134 | 136.927623 |
| bartowski_IQ4_NL | 0.284880 | 18.50 | 2981.250713 | 137.735717 |
| bartowski_IQ4_XS | 0.289398 | 17.52 | 3017.103823 | 135.980487 |
| unsloth_UD-IQ4_NL | 0.311913 | 16.59 | 2850.872626 | 123.322993 |
| AesSedai_IQ4_XS | 0.312924 | 16.40 | 3016.284929 | 120.057024 |
| unsloth_UD-IQ4_XS | 0.316742 | 16.28 | 2855.705903 | 121.589312 |
| Mungert_q4_1 | 0.335030 | 20.26 | 2833.595098 | 143.116543 |
| bartowski_Q4_0 | 0.351119 | 18.80 | 2921.368478 | 137.152109 |
| Mungert_iq4_nl | 0.362384 | 18.36 | 2996.884610 | 140.422106 |
| Mungert_iq4_xs | 0.376657 | 17.37 | 3042.389900 | 139.850819 |
| cmp-nct_UD-Q3_K_XL | 0.396947 | 16.05 | 2739.799015 | 105.006853 |
| Mungert_iq3_m | 0.409071 | 16.58 | 2871.107320 | 138.612701 |
| Mungert_iq3_s | 0.409071 | 16.58 | 2874.769301 | 139.805846 |
| bartowski_Q3_K_XL | 0.500855 | 16.13 | 2660.731996 | 123.457777 |
| Mungert_q3_k_m | 0.506792 | 16.29 | 2710.267499 | 131.202303 |
| Mungert_q4_0 | 0.748218 | 18.24 | 2956.897238 | 143.063168 |
Edit: If you want some models to be included that i forgot you have 24 hours to post a link to the models you want to get measured otherwise i'm going to reclaim my hdd space.
[link] [comments]




