Friendly reminder inference is WAY faster on Linux vs windows

Reddit r/LocalLLaMA / 3/29/2026

💬 OpinionSignals & Early TrendsTools & Practical Usage

共有:

Key Points

A Reddit user reports that Ollama inference on Linux (Ubuntu 22.04) is dramatically faster than on Windows 10 for two Qwen models tested at similar quantization and context lengths.
In their quick benchmarks, Linux nearly doubled or more the tokens-per-second rate (e.g., 18→31 t/s for Qwen Code Next q4, and 48→105 t/s for Qwen 3 30B Q4 A3B).
The author suggests this is a larger performance gap than they expected and asks whether others have observed similar differences.
They share the result as a practical reminder for people running local LLM inference to consider OS-level performance impacts.
The post is based on simple, user-run inference tests rather than a formal controlled study, so exact causes (drivers, builds, runtime settings) are not identified.

I have a simple home lab pc: 64gb ddr4, RTX 8000 48gb (Turing architecture) and core i9 9900k cpu. I use Linux Ubuntu 22.04 LTS. Before using this pc as a home lab it ran Windows 10. Over this weekend I reinstalled my Windows 10 ssd to check out my old projects. I updated Ollama to the latest version and tokens per second was way slower than when I was running Linux. I know Linux performs better but I didn’t think it would be twice as fast. Here are the results from a few simple inferences tests:

QWEN Code Next, q4, ctx length: 6k

Windows: 18 t/s

Linux: 31 t/s (+72%)

QWEN 3 30B A3B, Q4, ctx 6k

Windows: 48 t/s

Linux: 105 t/s (+118%)

Has anyone else experienced a performance this large before? Am I missing something?

Anyway thought I’d share this as a reminder for anyone looking for a bit more performance!

submitted by /u/triynizzles1
[link] [comments]

Black Hat Asia

AI Business

Persistent memory changes how people interact with AI — here's what I'm observing

Reddit r/artificial

Does a 3D Environment Change How You Retain Information From AI?

Reddit r/artificial

HumanExodus: Why I'm Building Measurement Infrastructure for the Largest Labour Transition in History

Dev.to

How Open-Source AI Skills Are Revolutionizing Affiliate Marketing

Dev.to

Friendly reminder inference is WAY faster on Linux vs windows

Key Points

Related Articles

Black Hat Asia

Persistent memory changes how people interact with AI — here's what I'm observing

Does a 3D Environment Change How You Retain Information From AI?

HumanExodus: Why I'm Building Measurement Infrastructure for the Largest Labour Transition in History

How Open-Source AI Skills Are Revolutionizing Affiliate Marketing

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer