| I see that on MLX, there simply is no smaller version of Qwen 3.5 397b other than the 4bit - and even then the 4bit is extremely poor on coding and other specifics (i’ll have benchmarks by tmrrw for regular MLX), and while 4bit MLX would be closer to 200gb, I was able to make a 180gb quantized version that scored 93% with reasoning on on MMLU 200 questions while retaining the full 38 token/s of the m3 ultra m chip speeds (gguf on mac has 1/3rd reduced speeds for qwen 3.5). https://huggingface.co/JANGQ-AI/Qwen3.5-397B-A17B-JANG\_2L Does anyone have benchmarks for the q2 or mlx’s 4bit? It would take me a few hrs to leave it running. [link] [comments] |
Qwen 3.5 397b (180gb) scores 93% on MMLU
Reddit r/LocalLLaMA / 3/20/2026
📰 NewsSignals & Early TrendsTools & Practical UsageModels & Research
Key Points
- The Reddit post claims that a 180GB quantized version of Qwen 3.5 397B scores 93% on MMLU (200 questions), suggesting strong performance at a relatively small size.
- The post notes that 4-bit MLX variants are poor for coding and other tasks, claims the 180GB quantized version preserves 38 tokens/s on an m3 Ultra M chip, and that GGUF on Mac reduces speeds by about one-third.
- A HuggingFace link to the Qwen3.5-397B model is provided, and the author asks for benchmarks on Qwen 2 or MLX 4-bit configurations, indicating ongoing benchmarking and comparisons.
- The submission by user HealthyCommunicat on Reddit’s r/LocalLLaMA highlights ongoing community benchmarking in the LLM quantization space.
Related Articles
Day 10: 230 Sessions of Hustle and It Comes Down to One Person Reading a Document
Dev.to
Two bots, one confused server: what Nimbus revealed about AI agent identity
Dev.to
How to Create a Month of Content in One Day Using AI (Step-by-Step System)
Dev.to

OpenTelemetry just standardized LLM tracing. Here's what it actually looks like in code.
Dev.to
🌱 How AI is Transforming Planting — and Why It Matters
Dev.to