AI Navigate

Mac Mini 4K 32GB Local LLM Performance

Reddit r/LocalLLaMA / 3/18/2026

💬 OpinionTools & Practical UsageModels & Research

Key Points

  • The author posts concrete performance figures for a Mac Mini 4K with 32GB RAM running a local LLM setup.
  • The setup uses OpenClaw 2026.3.8, LM Studio 0.4.6+1, and Unsloth gpt-oss-20b-Q4_K_S.gguf.
  • Default-like settings were adjusted: GPU offload 18, CPU thread pool size 7, max concurrents 4, number of experts 4, and flash attention enabled.
  • They report a context size of 26035 tokens and achieve 34 tokens per second with a time-to-first-token of 0.7 seconds on the first prompt.
  • A link to the Reddit thread provides the full discussion and comments.

It is hard to find any concrete performance figures so I am posting mine:

  • OpenClaw 2026.3.8
  • LM Studio 0.4.6+1
  • Unsloth gpt-oss-20b-Q4_K_S.gguf
  • Context size 26035
  • All other model settings are at the defaults (GPU offload = 18, CPU thread pool size = 7, max concurrents = 4, number of experts = 4, flash attention = on)

With this, after the first prompt I get 34 tok/s and 0.7 time to first token

submitted by /u/groover75
[link] [comments]