| submitted by /u/bonobomaster [link] [comments] |
LM Studio CPU thread pool size vs. tk/s with some MoE layers offloaded to CPU
Reddit r/LocalLLaMA / 4/18/2026
💬 OpinionSignals & Early TrendsTools & Practical Usage
Key Points
- The post discusses how changing LM Studio’s CPU thread pool size affects token-per-second (tk/s) performance when some Mixture-of-Experts (MoE) layers are offloaded to the CPU.
- It presents a comparison (via an image/chart) that shows the relationship between different thread pool sizes and observed throughput.
- The results imply that CPU parallelism settings can materially influence latency/throughput tradeoffs in partially CPU-offloaded MoE workloads.
- The takeaway is to benchmark thread pool size for the specific model and offloading configuration rather than relying on a one-size-fits-all setting.
- The discussion is framed around local LLM usage, targeting users optimizing performance on their own hardware.




