M5 32GB LM Studio, double checking my speeds

Reddit r/LocalLLaMA / 3/29/2026

💬 OpinionSignals & Early TrendsTools & Practical Usage

Read original →

共有:

Key Points

A user with an M5 MacBook Pro (32GB) running LM Studio reports perceived low generation speeds and shares measured tokens-per-second (t/s) for several local LLMs.
The reported results are approximately 8 t/s for Gemma3 27B 4-bit (MLX), 32 t/s for Nemotron 3 Nano 4B (GGUF), and 39 t/s for GPT OSS 20B (MLX).
They specify default context settings and runtime/component versions (MLX v1.4.0 on Metal and Llama v2.8.0) and ask others to confirm comparable speeds on similar hardware.
The user also invites community members to share other LM Studio model configurations (format, parameter size, and bit width) so they can reproduce and validate performance.
The thread is essentially a troubleshooting/performance comparison request rather than a new product release or technical guide.

I have a M5 MBP 32GB w. Mac OS 26.4, using LM Studio, and I suspect my speeds are low:

8 t/s Gemma3 27B 4Bit MLX

32 t/s Nemotron 3 Nano 4B GGUF

39 t/s GPT OSS 20B MLX

All models were loaded with Default Context settings and I used the following runtime versions:

MLX v1.4.0 M5 Metal

Llama v2.8.0

Can someone tell me if they got the same speeds with a similar configuration? even if it's MB Air instead of Pro.

Or if they can tell me other models they used in LM Studio (GGUF/MLX) Bit Size, Billion Size and I can double check to see what I get if I replicate this and get a similar T/s

submitted by /u/nemuro87
[link] [comments]