GGUF Quants Arena for MMLU (24GB VRAM + 128GB RAM)

Reddit r/LocalLLaMA / 4/16/2026

💬 OpinionSignals & Early TrendsTools & Practical Usage

Key Points

  • The post reports local inference benchmarking on an MMLU (DEV+TEST) subset using llama.cpp with GGUF quantized models, evaluating only a few fixed parameters (ctx 8192, seed 42, and fa enabled).
  • Qwen3.5-27B variants top the results, with Q5_K_XL reaching 87.33% and Q4_K_XL closely behind at 87.25% on 12263–12252 correct answers out of 14042.
  • Several other Qwen3.5 and Claude-derived GGUF quantizations cluster in the high-80% range, while larger models like the Qwen3.5-397B IQ2 variant score significantly lower (65.80%).
  • The findings specifically target the practicality of running competitive MMLU performance on limited hardware configurations (notably 24GB VRAM and 128GB RAM as implied by the title).

Dataset: MMLU subset (DEV+TEST)

Llamacpp setting: 3 params only ctx 8192 , seed 42 , fa on

Let me know whatelse do you want to see. Thanks.

Results:

Qwen3.5-27B-UD-Q5_K_XL.gguf 87.33% 12263/14042

Qwen3.5-27B-UD-Q4_K_XL.gguf 87.25% 12252/14042

Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled.i1-Q4_K_M.gguf 87.02% 12220/14042

Qwen3-Coder-Next-UD-Q4_K_XL.gguf 84.38% 11849/14042

Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive-Q4_K_M.gguf 83.25% 11690/14042

Qwen3.5-9B-UD-Q8_K_XL.gguf 78.81% 11067/14042

gemma-4-31B-it-UD-Q4_K_XL.gguf 78.36% 11004/14042 errors=1

Qwen3.5-397B-A17B-UD-IQ2_XXS-00001-of-00004.gguf 65.80% 9239/14042

submitted by /u/qwen_next_gguf_when
[link] [comments]