Friendly reminder inference is WAY faster on Linux vs windows

Reddit r/LocalLLaMA / 3/29/2026

💬 OpinionSignals & Early TrendsTools & Practical Usage

Key Points

  • A Reddit user reports that Ollama inference on Linux (Ubuntu 22.04) is dramatically faster than on Windows 10 for two Qwen models tested at similar quantization and context lengths.
  • In their quick benchmarks, Linux nearly doubled or more the tokens-per-second rate (e.g., 18→31 t/s for Qwen Code Next q4, and 48→105 t/s for Qwen 3 30B Q4 A3B).
  • The author suggests this is a larger performance gap than they expected and asks whether others have observed similar differences.
  • They share the result as a practical reminder for people running local LLM inference to consider OS-level performance impacts.
  • The post is based on simple, user-run inference tests rather than a formal controlled study, so exact causes (drivers, builds, runtime settings) are not identified.

I have a simple home lab pc: 64gb ddr4, RTX 8000 48gb (Turing architecture) and core i9 9900k cpu. I use Linux Ubuntu 22.04 LTS. Before using this pc as a home lab it ran Windows 10. Over this weekend I reinstalled my Windows 10 ssd to check out my old projects. I updated Ollama to the latest version and tokens per second was way slower than when I was running Linux. I know Linux performs better but I didn’t think it would be twice as fast. Here are the results from a few simple inferences tests:

QWEN Code Next, q4, ctx length: 6k

Windows: 18 t/s

Linux: 31 t/s (+72%)

QWEN 3 30B A3B, Q4, ctx 6k

Windows: 48 t/s

Linux: 105 t/s (+118%)

Has anyone else experienced a performance this large before? Am I missing something?

Anyway thought I’d share this as a reminder for anyone looking for a bit more performance!

submitted by /u/triynizzles1
[link] [comments]