AI Navigate

Qwen3.5-35B GGUF quants (16–22 GiB) - KLD + speed comparison

Reddit r/LocalLLaMA / 3/16/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The article benchmarks the Kullback-Leibler divergence (KLD) of Qwen3.5-35B-A3B GGUF quantizations (16–22 GiB) against FP16 using a mixed dataset assembled from FLORES 200 and calibration_data_v5_rc.txt to measure distribution similarity.
  • It presents two tables, sorted by KLD mean and KLD 99%, to compare models against a baseline and mirrors the approach from Unsloth's blogpost on Qwen models.
  • In TG/s speed measurements, the slowest model was UD-Q3_K_XL at about 105 t/s while the fastest was Mungert's iq4_nl at around 143 t/s, under a setup requiring an Intel CPU, an RTX 3090, and Linux with CUDA Driver 580.126.18 using llama-bench with a 10k context.
  • The author avoids declaring a single winner, stressing that readers should choose based on their specific constraints, especially in GPU-poor environments, and notes how results highlight models that perform well relative to their size and cost.

Qwen3.5-35B GGUF quants (16–22 GiB) - KLD + speed comparison

I'm back with some more benchmarks. I benchmarked the KLD divergence of the actual Qwen3.5-35B-A3B GGUF quantizations (16–22 GiB) available on Hugging Face.

KLD: The Kullback-Leibler divergence which shows how similar the FP16 and the quantized logit distributions are by measuring the difference in probability distributions between the quantized model and the FP16 baseline on a reference corpus.

u/TitwitMuffbiscuit had a shot at this some time ago but unfortunately all the models got updated a short period after he published his measurements.

For this research I also decided not to use the Wikitext-2 test dataset, which is in English, and instead took the multilingual FLORES 200 dataset out of which I extracted 700 KB of lines across randomly chosen languages. Additionally, I found another interesting dataset calibration_data_v5_rc.txt with about 400KB in size that contains a lot of interesting topics such as programming, math, syntax examples, technical text, etc. I combined both datasets into a mixed dataset to create the KLD baseline and measured the KLD distance for all the models that I found with this baseline.

I prepared two tables, where one is sorted by the classical "KLD mean" value and one that's sorted by the "KLD 99%" value, similar to the plots that Unsloth published on their latest blogpost about the Qwen models.

I'm not going to try to declare a winner here, that's up to you, given your very specific constraints as a GPU-Poor. To make it a little easier to visualize the models that are punching above their weight, i simply compare the numbers of the actual model to the model below and visualize them in bold letters if they are lower or higher based on the chosen metric.

The PP/s (prompt-processing) and TG/s (token-generation) columns are very specific numbers that will probably be meaningless to most users. You are going to need a Intel CPU, a RTX 3090 GPU (Ampere) and use Linux with Cuda Driver Version 580.126.18 to make use of those numbers. I used llama-bench with a context length of 10k to obtain these numbers.

Looking at the TG/s speed, for example, we can see that UD-Q3_K_XL from Unsloth before their last update was the slowest with a generation speed of ~105 t/s and the fastest is Mungert's iq4_nl with ~143 t/s which makes a total variation of 36.2% in the token generation speed for my specific architecture, which is shockingly high and one of the reasons why it is a little bit hard to define a so-called best model.

Notes: The cmp-nct prefixed models in the tables are actually a mirror from the older Unsloth quants that I found before their latest upload, which I also wanted to measure.

Sorted by KLD mean

Model KLD mean GiB PP/s TG/s
unsloth_UD-Q4_K_XL 0.016158 20.70 2812.949429 122.616934
AesSedai_Q4_K_M 0.016308 20.62 2966.807082 123.676699
unsloth_Q4_K_M 0.016708 20.49 2821.819502 123.910904
bartowski_Q4_K_L 0.020222 20.27 2809.591483 130.155778
unsloth_Q4_K_S 0.020469 19.24 2838.399411 124.346442
bartowski_Q4_K_M 0.022723 19.92 2806.437093 131.632558
cmp-nct_UD-Q4_K_XL 0.022863 19.16 2861.949731 125.816493
ubergarm_Q4_0 0.024576 19.78 2876.503157 124.357224
unsloth_UD-Q4_K_L 0.024691 18.81 2861.777605 131.242261
bartowski_Q4_K_S 0.025161 19.19 2849.248198 134.693183
Mungert_q4_k_m 0.026718 20.08 2812.234371 137.328114
cmp-nct_UD-Q4_K_M 0.030445 18.48 2840.653679 136.462817
bartowski_Q4_1 0.030681 20.45 2831.282134 136.927623
bartowski_IQ4_NL 0.032332 18.50 2981.250713 137.735717
bartowski_IQ4_XS 0.032829 17.52 3017.103823 135.980487
AesSedai_IQ4_XS 0.037086 16.40 3016.284929 120.057024
unsloth_UD-IQ4_NL 0.037691 16.59 2850.872626 123.322993
unsloth_UD-IQ4_XS 0.037835 16.28 2855.705903 121.589312
bartowski_Q4_0 0.040627 18.80 2921.368478 137.152109
Mungert_iq4_nl 0.040920 18.36 2996.884610 140.422106
Mungert_iq4_xs 0.042396 17.37 3042.389900 139.850819
Mungert_q4_1 0.045873 20.26 2833.595098 143.116543
cmp-nct_UD-Q3_K_XL 0.048064 16.05 2739.799015 105.006853
Mungert_iq3_m 0.049971 16.58 2871.107320 138.612701
Mungert_iq3_s 0.049971 16.58 2874.769301 139.805846
bartowski_Q3_K_XL 0.061445 16.13 2660.731996 123.457777
Mungert_q3_k_m 0.061488 16.29 2710.267499 131.202303
Mungert_q4_0 0.084376 18.24 2956.897238 143.063168

Sorted by KLD 99%

Model KLD 99% GiB PP/s TG/s
unsloth_UD-Q4_K_XL 0.145385 20.70 2812.949429 122.616934
AesSedai_Q4_K_M 0.147057 20.62 2966.807082 123.676699
unsloth_Q4_K_M 0.147594 20.49 2821.819502 123.910904
unsloth_Q4_K_S 0.177634 19.24 2838.399411 124.346442
bartowski_Q4_K_L 0.179187 20.27 2809.591483 130.155778
cmp-nct_UD-Q4_K_XL 0.191735 19.16 2861.949731 125.816493
bartowski_Q4_K_M 0.205318 19.92 2806.437093 131.632558
unsloth_UD-Q4_K_L 0.208308 18.81 2861.777605 131.242261
ubergarm_Q4_0 0.222435 19.78 2876.503157 124.357224
bartowski_Q4_K_S 0.227099 19.19 2849.248198 134.693183
Mungert_q4_k_m 0.235314 20.08 2812.234371 137.328114
cmp-nct_UD-Q4_K_M 0.252636 18.48 2840.653679 136.462817
bartowski_Q4_1 0.264378 20.45 2831.282134 136.927623
bartowski_IQ4_NL 0.284880 18.50 2981.250713 137.735717
bartowski_IQ4_XS 0.289398 17.52 3017.103823 135.980487
unsloth_UD-IQ4_NL 0.311913 16.59 2850.872626 123.322993
AesSedai_IQ4_XS 0.312924 16.40 3016.284929 120.057024
unsloth_UD-IQ4_XS 0.316742 16.28 2855.705903 121.589312
Mungert_q4_1 0.335030 20.26 2833.595098 143.116543
bartowski_Q4_0 0.351119 18.80 2921.368478 137.152109
Mungert_iq4_nl 0.362384 18.36 2996.884610 140.422106
Mungert_iq4_xs 0.376657 17.37 3042.389900 139.850819
cmp-nct_UD-Q3_K_XL 0.396947 16.05 2739.799015 105.006853
Mungert_iq3_m 0.409071 16.58 2871.107320 138.612701
Mungert_iq3_s 0.409071 16.58 2874.769301 139.805846
bartowski_Q3_K_XL 0.500855 16.13 2660.731996 123.457777
Mungert_q3_k_m 0.506792 16.29 2710.267499 131.202303
Mungert_q4_0 0.748218 18.24 2956.897238 143.063168

Edit: If you want some models to be included that i forgot you have 24 hours to post a link to the models you want to get measured otherwise i'm going to reclaim my hdd space.

submitted by /u/StrikeOner
[link] [comments]