Qwen3.5-35B GGUF quants (16–22 GiB) - KLD + speed comparison

Reddit r/LocalLLaMA / 3/16/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The article benchmarks the Kullback-Leibler divergence (KLD) of Qwen3.5-35B-A3B GGUF quantizations (16–22 GiB) against FP16 using a mixed dataset assembled from FLORES 200 and calibration_data_v5_rc.txt to measure distribution similarity.
It presents two tables, sorted by KLD mean and KLD 99%, to compare models against a baseline and mirrors the approach from Unsloth's blogpost on Qwen models.
In TG/s speed measurements, the slowest model was UD-Q3_K_XL at about 105 t/s while the fastest was Mungert's iq4_nl at around 143 t/s, under a setup requiring an Intel CPU, an RTX 3090, and Linux with CUDA Driver 580.126.18 using llama-bench with a 10k context.
The author avoids declaring a single winner, stressing that readers should choose based on their specific constraints, especially in GPU-poor environments, and notes how results highlight models that perform well relative to their size and cost.

Qwen3.5-35B GGUF quants (16–22 GiB) - KLD + speed comparison

I'm back with some more benchmarks. I benchmarked the KLD divergence of the actual Qwen3.5-35B-A3B GGUF quantizations (16–22 GiB) available on Hugging Face.

KLD: The Kullback-Leibler divergence which shows how similar the FP16 and the quantized logit distributions are by measuring the difference in probability distributions between the quantized model and the FP16 baseline on a reference corpus.

u/TitwitMuffbiscuit had a shot at this some time ago but unfortunately all the models got updated a short period after he published his measurements.

For this research I also decided not to use the Wikitext-2 test dataset, which is in English, and instead took the multilingual FLORES 200 dataset out of which I extracted 700 KB of lines across randomly chosen languages. Additionally, I found another interesting dataset calibration_data_v5_rc.txt with about 400KB in size that contains a lot of interesting topics such as programming, math, syntax examples, technical text, etc. I combined both datasets into a mixed dataset to create the KLD baseline and measured the KLD distance for all the models that I found with this baseline.

I prepared two tables, where one is sorted by the classical "KLD mean" value and one that's sorted by the "KLD 99%" value, similar to the plots that Unsloth published on their latest blogpost about the Qwen models.

I'm not going to try to declare a winner here, that's up to you, given your very specific constraints as a GPU-Poor. To make it a little easier to visualize the models that are punching above their weight, i simply compare the numbers of the actual model to the model below and visualize them in bold letters if they are lower or higher based on the chosen metric.

The PP/s (prompt-processing) and TG/s (token-generation) columns are very specific numbers that will probably be meaningless to most users. You are going to need a Intel CPU, a RTX 3090 GPU (Ampere) and use Linux with Cuda Driver Version 580.126.18 to make use of those numbers. I used llama-bench with a context length of 10k to obtain these numbers.

Looking at the TG/s speed, for example, we can see that UD-Q3_K_XL from Unsloth before their last update was the slowest with a generation speed of ~105 t/s and the fastest is Mungert's iq4_nl with ~143 t/s which makes a total variation of 36.2% in the token generation speed for my specific architecture, which is shockingly high and one of the reasons why it is a little bit hard to define a so-called best model.

Notes: The cmp-nct prefixed models in the tables are actually a mirror from the older Unsloth quants that I found before their latest upload, which I also wanted to measure.

Sorted by KLD mean

Model	KLD mean	GiB	PP/s	TG/s
unsloth_UD-Q4_K_XL	0.016158	20.70	2812.949429	122.616934
AesSedai_Q4_K_M	0.016308	20.62	2966.807082	123.676699
unsloth_Q4_K_M	0.016708	20.49	2821.819502	123.910904
bartowski_Q4_K_L	0.020222	20.27	2809.591483	130.155778
unsloth_Q4_K_S	0.020469	19.24	2838.399411	124.346442
bartowski_Q4_K_M	0.022723	19.92	2806.437093	131.632558
cmp-nct_UD-Q4_K_XL	0.022863	19.16	2861.949731	125.816493
ubergarm_Q4_0	0.024576	19.78	2876.503157	124.357224
unsloth_UD-Q4_K_L	0.024691	18.81	2861.777605	131.242261
bartowski_Q4_K_S	0.025161	19.19	2849.248198	134.693183
Mungert_q4_k_m	0.026718	20.08	2812.234371	137.328114
cmp-nct_UD-Q4_K_M	0.030445	18.48	2840.653679	136.462817
bartowski_Q4_1	0.030681	20.45	2831.282134	136.927623
bartowski_IQ4_NL	0.032332	18.50	2981.250713	137.735717
bartowski_IQ4_XS	0.032829	17.52	3017.103823	135.980487
AesSedai_IQ4_XS	0.037086	16.40	3016.284929	120.057024
unsloth_UD-IQ4_NL	0.037691	16.59	2850.872626	123.322993
unsloth_UD-IQ4_XS	0.037835	16.28	2855.705903	121.589312
bartowski_Q4_0	0.040627	18.80	2921.368478	137.152109
Mungert_iq4_nl	0.040920	18.36	2996.884610	140.422106
Mungert_iq4_xs	0.042396	17.37	3042.389900	139.850819
Mungert_q4_1	0.045873	20.26	2833.595098	143.116543
cmp-nct_UD-Q3_K_XL	0.048064	16.05	2739.799015	105.006853
Mungert_iq3_m	0.049971	16.58	2871.107320	138.612701
Mungert_iq3_s	0.049971	16.58	2874.769301	139.805846
bartowski_Q3_K_XL	0.061445	16.13	2660.731996	123.457777
Mungert_q3_k_m	0.061488	16.29	2710.267499	131.202303
Mungert_q4_0	0.084376	18.24	2956.897238	143.063168

Sorted by KLD 99%

Model	KLD 99%	GiB	PP/s	TG/s
unsloth_UD-Q4_K_XL	0.145385	20.70	2812.949429	122.616934
AesSedai_Q4_K_M	0.147057	20.62	2966.807082	123.676699
unsloth_Q4_K_M	0.147594	20.49	2821.819502	123.910904
unsloth_Q4_K_S	0.177634	19.24	2838.399411	124.346442
bartowski_Q4_K_L	0.179187	20.27	2809.591483	130.155778
cmp-nct_UD-Q4_K_XL	0.191735	19.16	2861.949731	125.816493
bartowski_Q4_K_M	0.205318	19.92	2806.437093	131.632558
unsloth_UD-Q4_K_L	0.208308	18.81	2861.777605	131.242261
ubergarm_Q4_0	0.222435	19.78	2876.503157	124.357224
bartowski_Q4_K_S	0.227099	19.19	2849.248198	134.693183
Mungert_q4_k_m	0.235314	20.08	2812.234371	137.328114
cmp-nct_UD-Q4_K_M	0.252636	18.48	2840.653679	136.462817
bartowski_Q4_1	0.264378	20.45	2831.282134	136.927623
bartowski_IQ4_NL	0.284880	18.50	2981.250713	137.735717
bartowski_IQ4_XS	0.289398	17.52	3017.103823	135.980487
unsloth_UD-IQ4_NL	0.311913	16.59	2850.872626	123.322993
AesSedai_IQ4_XS	0.312924	16.40	3016.284929	120.057024
unsloth_UD-IQ4_XS	0.316742	16.28	2855.705903	121.589312
Mungert_q4_1	0.335030	20.26	2833.595098	143.116543
bartowski_Q4_0	0.351119	18.80	2921.368478	137.152109
Mungert_iq4_nl	0.362384	18.36	2996.884610	140.422106
Mungert_iq4_xs	0.376657	17.37	3042.389900	139.850819
cmp-nct_UD-Q3_K_XL	0.396947	16.05	2739.799015	105.006853
Mungert_iq3_m	0.409071	16.58	2871.107320	138.612701
Mungert_iq3_s	0.409071	16.58	2874.769301	139.805846
bartowski_Q3_K_XL	0.500855	16.13	2660.731996	123.457777
Mungert_q3_k_m	0.506792	16.29	2710.267499	131.202303
Mungert_q4_0	0.748218	18.24	2956.897238	143.063168

Edit: If you want some models to be included that i forgot you have 24 hours to post a link to the models you want to get measured otherwise i'm going to reclaim my hdd space.

submitted by /u/StrikeOner
[link] [comments]