2 days ago there was a very cool post by u/nickl:
https://reddit.com/r/LocalLLaMA/comments/1s7r9wu/
Highly recommend checking it out!
I've run this benchmark on a bunch of local models that can fit into my RTX 5080, some of them partially offloaded to RAM (I have 96GB, but most will fit if you have 64).
Results:
24: unsloth/Qwen3.5-122B-A10B-GGUF:UD-Q4_K_XL 🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟩🟩🟩🟥🟩 🟩🟩🟩🟩🟩 23: bartowski/Qwen_Qwen3.5-27B-GGUF:IQ4_XS 🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟩🟩🟩🟥🟩 🟥🟩🟩🟩🟩 23: unsloth/Qwen3.5-122B-A10B-GGUF:UD-IQ3_XXS 🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟩🟩🟩🟥🟩 🟥🟩🟩🟩🟩 22: unsloth/Qwen3.5-35B-A3B-GGUF:UD-Q6_K_XL 🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟩🟩🟥🟩🟩 🟩🟩🟩🟥🟩 🟥🟩🟩🟩🟩 22: mradermacher/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-i1-GGUF:Q3_K_M 🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟩🟥🟩🟥🟩 🟥🟩🟩🟩🟩 22: Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF:Q4_K_M 🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟩🟩🟥🟥🟩 🟥🟩🟩🟩🟩 21: unsloth/NVIDIA-Nemotron-3-Super-120B-A12B-GGUF:UD-Q4_K_S 🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟩🟩🟩🟨🟥 🟥🟨🟩🟩🟩 20: unsloth/Qwen3-Coder-Next-GGUF:UD-Q5_K_XL 🟩🟩🟩🟩🟨 🟩🟩🟩🟩🟩 🟩🟩🟨🟩🟩 🟩🟩🟩🟥🟨 🟥🟩🟩🟩🟩 20: mradermacher/Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-i1-GGUF:Q6_K 🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟩🟩🟥🟩🟩 🟥🟩🟩🟥🟩 🟥🟥🟩🟩🟩 19: unsloth/GLM-4.7-Flash-GGUF:UD-Q6_K_XL 🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟩🟩🟥🟩🟩 🟩🟩🟩🟥🟨 🟥🟨🟩🟥🟩 18: unsloth/GLM-4.5-Air-GGUF:Q5_K_M 🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟩🟩🟥🟩🟩 🟥🟩🟩🟥🟩 🟨🟨🟥🟩🟨 18: bartowski/nvidia_Nemotron-Cascade-2-30B-A3B-GGUF:Q6_K_L 🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟩🟩🟨🟩🟩 🟩🟩🟩🟥🟩 🟨🟨🟥🟨🟨 17: Jackrong/Qwopus3.5-9B-v3-GGUF:Q8_0 🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟩🟥🟥🟩🟩 🟥🟩🟥🟥🟥 🟥🟩🟩🟩🟨 16: unsloth/Qwen3-Coder-Next-GGUF:UD-Q4_K_XL 🟩🟩🟩🟩🟨 🟩🟩🟩🟩🟩 🟩🟩🟨🟩🟩 🟥🟨🟩🟥🟨 🟥🟨🟩🟨🟩 16: byteshape/Devstral-Small-2-24B-Instruct-2512-GGUF:IQ3_S 🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟥🟩🟨🟩🟩 🟩🟩🟨🟥🟨 🟨🟨🟥🟨🟩 16: mradermacher/Qwen3.5-9B-Claude-4.6-HighIQ-THINKING-i1-GGUF:Q6_K 🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟩🟩🟨🟥🟩 🟥🟩🟥🟥🟨 🟥🟩🟥🟩🟨 14: mradermacher/Qwen3.5-9B-Claude-4.6-HighIQ-INSTRUCT-i1-GGUF:Q6_K 🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟥🟩🟥🟩🟩 🟩🟨🟥🟥🟨 🟨🟨🟥🟨🟨 14: unsloth/GLM-4.6V-GGUF:Q3_K_S 🟩🟩🟩🟩🟩 🟩🟩🟩🟩🟩 🟥🟩🟨🟨🟩 🟥🟩🟩🟨🟨 🟨🟨🟨🟨🟨 5: bartowski/Tesslate_OmniCoder-9B-GGUF:Q6_K_L 🟨🟨🟨🟨🟨 🟨🟨🟨🟩🟩 🟩🟨🟨🟩🟨 🟨🟨🟩🟨🟨 🟨🟨🟨🟨🟨 5: unsloth/Qwen3.5-9B-GGUF:UD-Q6_K_XL 🟨🟨🟨🟨🟨 🟨🟨🟨🟩🟩 🟨🟩🟨🟨🟩 🟨🟩🟨🟨🟨 🟨🟨🟨🟨🟨 The biggest surprise is Qwen3.5-9B-Claude-4.6-HighIQ-THINKING to be honest, going from 5 green tests with Qwen3.5-9B to 16 green tests. Most errors of Qwen3.5-9B boiled down to being unable to call the tools with correct formatting. For how small it is it's a very reliable finetune.
Qwen3.5-122B-A10B is still king with 16GB GPUs because I can offload experts to RAM. Speed isn't perfect but the quality is great and I can fit a sizable context into VRAM. Q4_K_XL uses around 68GB RAM, IQ3_XXS around 33GB RAM, so the smaller quant can be used with 64GB system RAM.
Note though - these benchmarks mostly test a pretty isolated SQL call. It's a nice quick benchmark to compare two models, even with tool calling, but it's not representative of a larger codebase context understanding where larger models will pull ahead.
Edit: added a 9B Qwopus model
[link] [comments]




