I have a M5 MBP 32GB w. Mac OS 26.4, using LM Studio, and I suspect my speeds are low:
8 t/s Gemma3 27B 4Bit MLX
32 t/s Nemotron 3 Nano 4B GGUF
39 t/s GPT OSS 20B MLX
All models were loaded with Default Context settings and I used the following runtime versions:
MLX v1.4.0 M5 Metal
Llama v2.8.0
Can someone tell me if they got the same speeds with a similar configuration? even if it's MB Air instead of Pro.
Or if they can tell me other models they used in LM Studio (GGUF/MLX) Bit Size, Billion Size and I can double check to see what I get if I replicate this and get a similar T/s
[link] [comments]



![[P] I trained an AI to play Resident Evil 4 Remake using Behavioral Cloning + LSTM](/_next/image?url=https%3A%2F%2Fexternal-preview.redd.it%2FzgmJOxETuqgqlsgMxeBl7S4gZNDHf_K3U9w883ioT4M.jpeg%3Fwidth%3D320%26crop%3Dsmart%26auto%3Dwebp%26s%3Da63f97b9d03c40b846cd3eaac472e78050020a43&w=3840&q=75)
