M5 32GB LM Studio, double checking my speeds

Reddit r/LocalLLaMA / 3/29/2026

💬 OpinionSignals & Early TrendsTools & Practical Usage

Key Points

  • A user with an M5 MacBook Pro (32GB) running LM Studio reports perceived low generation speeds and shares measured tokens-per-second (t/s) for several local LLMs.
  • The reported results are approximately 8 t/s for Gemma3 27B 4-bit (MLX), 32 t/s for Nemotron 3 Nano 4B (GGUF), and 39 t/s for GPT OSS 20B (MLX).
  • They specify default context settings and runtime/component versions (MLX v1.4.0 on Metal and Llama v2.8.0) and ask others to confirm comparable speeds on similar hardware.
  • The user also invites community members to share other LM Studio model configurations (format, parameter size, and bit width) so they can reproduce and validate performance.
  • The thread is essentially a troubleshooting/performance comparison request rather than a new product release or technical guide.

I have a M5 MBP 32GB w. Mac OS 26.4, using LM Studio, and I suspect my speeds are low:

8 t/s Gemma3 27B 4Bit MLX

32 t/s Nemotron 3 Nano 4B GGUF

39 t/s GPT OSS 20B MLX

All models were loaded with Default Context settings and I used the following runtime versions:

MLX v1.4.0 M5 Metal

Llama v2.8.0

Can someone tell me if they got the same speeds with a similar configuration? even if it's MB Air instead of Pro.

Or if they can tell me other models they used in LM Studio (GGUF/MLX) Bit Size, Billion Size and I can double check to see what I get if I replicate this and get a similar T/s

submitted by /u/nemuro87
[link] [comments]