As a life-long Windows user (don't hate me, I was exposed to it at a young age) I was wondering how much (if any) performance I'm leaving on the table. So I did the sensible thing and run some benchmarks.
Setup:
- OS: Windows 11 25H2 vs Lubuntu 26.04
- Engine: Llama.cpp b8929, CUDA 13.1 (downloaded official prebuilt for Windows, compiled myself with CMake on Lubuntu)
- CPU: Intel Core i9-14900KF
- RAM: 64GB DDR5 6800 MT/s
- GPU: RTX 5080 16GB VRAM
- Drivers: 596.32 (Windows) / 595.x (Lubuntu)
The Results (Averaged)
I ran a 2500+ token prompt against llama-cli across several different models.
(Note: Gemma 4, OSS-20B & Qwen3.6 were fully offloaded to the GPU. Qwen3.5 & OSS-120B were hybrid CPU/GPU runs using -t 8 -tb 8 -fit on)
| Model | Win 11 (Prompt) | Lubuntu (Prompt) | Prompt Diff | Win 11 (Gen) | Lubuntu (Gen) | Gen Diff |
|---|---|---|---|---|---|---|
| Gemma-4-E4B-it (Q8_K_XL) | 6,232 t/s | 7,587 t/s | + 21.7% | 111.7 t/s | 116.7 t/s | + 4.4% |
| Qwen3.5-35B-A3B (Q8_K_XL) | 305 t/s | 742 t/s | + 143.2% | 48.1 t/s | 52.2 t/s | + 8.5% |
| GPT-OSS-20B (MXFP4) | 7,619 t/s | 8,140 t/s | + 6.8% | 195.8 t/s | 206.2 t/s | + 5.3% |
| Qwen3.6-27B (IQ4_XS) | 2,077 t/s | 2,235 t/s | + 7.6% | 43.8 t/s | 46.0 t/s | + 5.0% |
| GPT-OSS-120B (MXFP4) | 310 t/s | 649 t/s | + 109.3% | 43.4 t/s | 44.9 t/s | + 3.4% |
Takeaways
- Generation Speeds: Lubuntu is consistently about 4% to 8% faster across the board for token generation. It's a nice bump, but maybe not enough to justify an OS swap on its own if you only care about reading speed.
- Prompt Processing (Fully Offloaded): Linux handles prompt evaluation on the GPU noticeably faster. Even on the lower end, it's 6-7% faster, and up to 21% faster on the Gemma 4 run.
- Prompt Processing (CPU/GPU Hybrid): This is where it gets crazy. On the models where Llama.cpp had to lean on the CPU (-t 8 -tb 8), Linux completely obliterated Windows by over 100% to 140% in prompt processing speed.
Raw Run Logs:
Windows 11:
.\llama-cli -m "E:\models\unsloth\gemma-4-E4B-it-GGUF\gemma-4-E4B-it-UD-Q8_K_XL.gguf" -c 8192 -mli -fa on --temp 1.0 --top-k 64 --top-p 0.95 --min-p 0.0 -ngl all -np 1 --no-mmap --jinja --chat-template-kwargs '{\"enable_thinking\":true}' [ Prompt: 4038.3 t/s | Generation: 111.6 t/s ][ Prompt: 7341.7 t/s | Generation: 111.8 t/s ][ Prompt: 6432.1 t/s | Generation: 111.9 t/s ][ Prompt: 7116.3 t/s | Generation: 111.7 t/s ] .\llama-cli -m "E:\models\unsloth\Qwen3.5-35B-A3B-GGUF\Qwen3.5-35B-A3B-UD-Q8_K_XL.gguf" -c 16384 -mli -fa on --temp 0.6 --top-k 20 --top-p 0.95 --min-p 0.0 -np 1 --no-mmap --chat-template-kwargs "{\"enable_thinking\":true}" -t 8 -tb 8 -fit on -fitt 160M [ Prompt: 296.5 t/s | Generation: 48.4 t/s ][ Prompt: 308.6 t/s | Generation: 48.0 t/s ][ Prompt: 313.7 t/s | Generation: 48.2 t/s ][ Prompt: 302.1 t/s | Generation: 47.8 t/s ] .\llama-cli -m "E:\models\lmstudio-community\gpt-oss-20b-GGUF\gpt-oss-20b-MXFP4.gguf" -c 32768 -mli -fa on --temp 1.0 --top-k 0 --top-p 1.0 --min-p 0.0 -ngl all -np 1 --no-mmap --jinja [ Prompt: 7651.2 t/s | Generation: 195.6 t/s ][ Prompt: 7661.0 t/s | Generation: 196.6 t/s ][ Prompt: 7653.2 t/s | Generation: 196.6 t/s ][ Prompt: 7510.8 t/s | Generation: 194.6 t/s ] .\llama-cli -m "E:\models\unsloth\Qwen3.6-27B-GGUF\Qwen3.6-27B-IQ4_XS.gguf" -c 8192 -mli -fa on --temp 1.0 --top-k 20 --top-p 0.95 --min-p 0.0 --presence_penalty 1.5 -ngl all -np 1 --no-mmap --jinja [ Prompt: 1859.4 t/s | Generation: 43.2 t/s ][ Prompt: 2132.9 t/s | Generation: 43.0 t/s ][ Prompt: 2153.1 t/s | Generation: 44.5 t/s ][ Prompt: 2166.1 t/s | Generation: 44.5 t/s ] .\llama-cli -m "E:\models\lmstudio-community\gpt-oss-120b-GGUF\gpt-oss-120b-MXFP4-00001-of-00002.gguf" -c 16384 -mli -fa on --temp 1.0 --top-k 0 --top-p 1.0 --min-p 0.0 -np 1 --no-mmap --jinja -t 8 -tb 8 -fit on -fitt 160M [ Prompt: 324.3 t/s | Generation: 43.3 t/s ][ Prompt: 320.8 t/s | Generation: 43.4 t/s ][ Prompt: 284.9 t/s | Generation: 43.4 t/s ] Lubuntu 26.04:
./llama-cli -m /home/user/models/gemma-4-E4B-it-GGUF/gemma-4-E4B-it-UD-Q8_K_XL.gguf -c 8192 -mli -fa on --temp 1.0 --top-k 64 --top-p 0.95 --min-p 0.0 -ngl all -np 1 --no-mmap --jinja --chat-template-kwargs "{\"enable_thinking\":true}" [ Prompt: 7621,5 t/s | Generation: 116,6 t/s ][ Prompt: 7537,8 t/s | Generation: 116,6 t/s ][ Prompt: 7665,7 t/s | Generation: 116,7 t/s ][ Prompt: 7523,5 t/s | Generation: 116,8 t/s ] ./llama-cli -m /home/user/models/Qwen3.5-35B-A3B-GGUF/Qwen3.5-35B-A3B-UD-Q8_K_XL.gguf -c 16384 -mli -fa on --temp 0.6 --top-k 20 --top-p 0.95 --min-p 0.0 -np 1 --no-mmap --chat-template-kwargs "{\"enable_thinking\":true}" -t 8 -tb 8 -fit on -fitt 160M [ Prompt: 739,4 t/s | Generation: 52,3 t/s ][ Prompt: 744,6 t/s | Generation: 52,0 t/s ][ Prompt: 746,3 t/s | Generation: 52,3 t/s ][ Prompt: 741,3 t/s | Generation: 52,2 t/s ] ./llama-cli -m /home/user/models/gpt-oss-20b-GGUF/gpt-oss-20b-MXFP4.gguf -c 32768 -mli -fa on --temp 1.0 --top-k 0 --top-p 1.0 --min-p 0.0 -ngl all -np 1 --no-mmap --jinja [ Prompt: 7819,8 t/s | Generation: 205,7 t/s ][ Prompt: 8250,8 t/s | Generation: 206,4 t/s ][ Prompt: 8254,9 t/s | Generation: 206,9 t/s ][ Prompt: 8237,0 t/s | Generation: 206,0 t/s ] ./llama-cli -m /home/user/models/Qwen3.6-27B-GGUF/Qwen3.6-27B-IQ4_XS.gguf -c 8192 -mli -fa on --temp 1.0 --top-k 20 --top-p 0.95 --min-p 0.0 --presence_penalty 1.5 -ngl all -np 1 --no-mmap --jinja [ Prompt: 2238,1 t/s | Generation: 46,0 t/s ][ Prompt: 2232,3 t/s | Generation: 46,0 t/s ][ Prompt: 2235,4 t/s | Generation: 46,0 t/s ][ Prompt: 2237,3 t/s | Generation: 46,0 t/s ] ./llama-cli -m /home/user/models/gpt-oss-120b-GGUF/gpt-oss-120b-MXFP4-00001-of-00002.gguf -c 16384 -mli -fa on --temp 1.0 --top-k 0 --top-p 1.0 --min-p 0.0 -np 1 --no-mmap --jinja -fit on -fitt 160M -t 8 -tb 8 [ Prompt: 650,0 t/s | Generation: 45,2 t/s ][ Prompt: 647,8 t/s | Generation: 45,0 t/s ][ Prompt: 650,3 t/s | Generation: 44,7 t/s ][ Prompt: 649,0 t/s | Generation: 45,0 t/s ] [link] [comments]




