It is hard to find any concrete performance figures so I am posting mine:
- OpenClaw 2026.3.8
- LM Studio 0.4.6+1
- Unsloth gpt-oss-20b-Q4_K_S.gguf
- Context size 26035
- All other model settings are at the defaults (GPU offload = 18, CPU thread pool size = 7, max concurrents = 4, number of experts = 4, flash attention = on)
With this, after the first prompt I get 34 tok/s and 0.7 time to first token
[link] [comments]




