AI Navigate

Qwen 3.5 397b (180gb) scores 93% on MMLU

Reddit r/LocalLLaMA / 3/20/2026

📰 NewsSignals & Early TrendsTools & Practical UsageModels & Research

Key Points

  • The Reddit post claims that a 180GB quantized version of Qwen 3.5 397B scores 93% on MMLU (200 questions), suggesting strong performance at a relatively small size.
  • The post notes that 4-bit MLX variants are poor for coding and other tasks, claims the 180GB quantized version preserves 38 tokens/s on an m3 Ultra M chip, and that GGUF on Mac reduces speeds by about one-third.
  • A HuggingFace link to the Qwen3.5-397B model is provided, and the author asks for benchmarks on Qwen 2 or MLX 4-bit configurations, indicating ongoing benchmarking and comparisons.
  • The submission by user HealthyCommunicat on Reddit’s r/LocalLLaMA highlights ongoing community benchmarking in the LLM quantization space.
Qwen 3.5 397b (180gb) scores 93% on MMLU

I see that on MLX, there simply is no smaller version of Qwen 3.5 397b other than the 4bit - and even then the 4bit is extremely poor on coding and other specifics (i’ll have benchmarks by tmrrw for regular MLX), and while 4bit MLX would be closer to 200gb, I was able to make a 180gb quantized version that scored 93% with reasoning on on MMLU 200 questions while retaining the full 38 token/s of the m3 ultra m chip speeds (gguf on mac has 1/3rd reduced speeds for qwen 3.5).

https://huggingface.co/JANGQ-AI/Qwen3.5-397B-A17B-JANG\_2L

Does anyone have benchmarks for the q2 or mlx’s 4bit? It would take me a few hrs to leave it running.

submitted by /u/HealthyCommunicat
[link] [comments]