Are Unsloth models as good as I read?

Reddit r/LocalLLaMA / 4/26/2026

💬 OpinionSignals & Early TrendsTools & Practical Usage

Key Points

  • The post asks whether Unsloth’s quantized models are as good as users claim, specifically comparing raw model performance against Unsloth Studio variants.
  • One example user reports a sizable speed difference on an MBP with 64GB RAM: ~39 tokens/s for qwen3.6:35b-a3b Q4_K_M versus ~57 tokens/s for unsloth/qwen3.6:35b-a3b UD-Q4_K_XL.
  • The author attributes the improvement to Unsloth’s per-layer sensitivity analysis that assigns different quantization levels to layers deemed more or less important.
  • The post references the expectation that this approach not only reduces model size but may also preserve or improve performance, and invites others to share real-world experiences.

Has anybody done some comparing between the models that Unsloth offers and their counter part?
For example: I've been using qwen3.6:35b-a3b Q4_K_M , and on my MBP 64GB I get around 39 t/s
Using Unsloth Studio, unsloth/qwen3.6:35b-a3b UD-Q4_K_XL I get around 57 t/s

The difference in speed is significant. From what I've understood the Unsloth model runs a per-layer sensitivity analysis and assigns different quantization levels depending on how "important" each layer is. This obviously makes the model smaller, and from what I've been reading, the model should even perform better.

What are your experiences?

submitted by /u/denis-craciun
[link] [comments]