Is it normal for Gemma 4 26B/31B to run this fast on an Intel laptop? (288V / CachyOS)

Reddit r/LocalLLaMA / 4/12/2026

💬 OpinionSignals & Early TrendsTools & Practical Usage

共有:

Key Points

A new local LLM user reports getting Gemma 4 MoE models (26B/31B GGUF) to run unusually fast on an Intel Core Ultra 9 288V laptop under CachyOS.
They initially struggled with Ollama/LM Studio and “hard stops,” and couldn’t get OpenVINO to integrate well with the NPU for these larger models.
To make it work, they compiled a custom Vulkan GPU bridge, after which the GPU usage reached about 95–100% with CPU modestly used and RAM around 20–24GB.
Reported throughput is roughly 7–12 tokens/sec at 16k context for the 26B model, and 4–8k context for the 31B variant, while also noting no swap used so far.
The poster asks whether this performance level is typical for integrated graphics and whether Intel Lunar Lake-class hardware is particularly strong for local MoE models.

Is it normal for Gemma 4 26B/31B to run this fast on an Intel laptop? (288V / CachyOS)

Hey everyone, I just got into local LLMs about a week ago. I tried Ollama and LMStudio on my Core Ultra 9 288V, but they kept failing or giving me "hard stops" on the MoE models, so I figured I’d just try building the environment myself.

I couldn’t get OpenVINO to play nice with the NPU for these larger models yet, so I just compiled a custom Vulkan bridge for the GPU instead. It seems to be working?

Performance Stats:

Model: Gemma-4-26B-it-i1 (GGUF)
Speed: 7-12 t/s (16k context)
Hardware Use: 95-100% GPU, 10-40% CPU, 20-24GB RAM.

I also tried the 31B-it-i1-Q4_K_M.gguf version. It's a bit heavier but still totally usable:

Speed: Decent/Fluid (4-8k context)
Hardware Use: 100% GPU, ~30-60% CPU (Xe2 and the logic cores seems to be sharing the load well).
RAM: Pushing 26GB out of 29GB free, but 0GB swap used so far.

Is this a normal result for integrated graphics? I only got it working on the CPU at first which was faster although unsustainable, but once the Vulkan bridge was built, it is balanced. I'm using CachyOS if that makes a difference.

Just wanted to see if I’m missing something or if Intel Lunar Lake is actually this cracked for local MoE.

submitted by /u/No-Key8555
[link] [comments]