Is it normal for Gemma 4 26B/31B to run this fast on an Intel laptop? (288V / CachyOS)

Reddit r/LocalLLaMA / 4/12/2026

💬 OpinionSignals & Early TrendsTools & Practical Usage

Key Points

  • A new local LLM user reports getting Gemma 4 MoE models (26B/31B GGUF) to run unusually fast on an Intel Core Ultra 9 288V laptop under CachyOS.
  • They initially struggled with Ollama/LM Studio and “hard stops,” and couldn’t get OpenVINO to integrate well with the NPU for these larger models.
  • To make it work, they compiled a custom Vulkan GPU bridge, after which the GPU usage reached about 95–100% with CPU modestly used and RAM around 20–24GB.
  • Reported throughput is roughly 7–12 tokens/sec at 16k context for the 26B model, and 4–8k context for the 31B variant, while also noting no swap used so far.
  • The poster asks whether this performance level is typical for integrated graphics and whether Intel Lunar Lake-class hardware is particularly strong for local MoE models.
Is it normal for Gemma 4 26B/31B to run this fast on an Intel laptop? (288V / CachyOS)

Hey everyone, I just got into local LLMs about a week ago. I tried Ollama and LMStudio on my Core Ultra 9 288V, but they kept failing or giving me "hard stops" on the MoE models, so I figured I’d just try building the environment myself.

I couldn’t get OpenVINO to play nice with the NPU for these larger models yet, so I just compiled a custom Vulkan bridge for the GPU instead. It seems to be working?

Performance Stats:

  • Model: Gemma-4-26B-it-i1 (GGUF)
  • Speed: 7-12 t/s (16k context)
  • Hardware Use: 95-100% GPU, 10-40% CPU, 20-24GB RAM.

I also tried the 31B-it-i1-Q4_K_M.gguf version. It's a bit heavier but still totally usable:

  • Speed: Decent/Fluid (4-8k context)
  • Hardware Use: 100% GPU, ~30-60% CPU (Xe2 and the logic cores seems to be sharing the load well).
  • RAM: Pushing 26GB out of 29GB free, but 0GB swap used so far.

Is this a normal result for integrated graphics? I only got it working on the CPU at first which was faster although unsustainable, but once the Vulkan bridge was built, it is balanced. I'm using CachyOS if that makes a difference.

Just wanted to see if I’m missing something or if Intel Lunar Lake is actually this cracked for local MoE.

submitted by /u/No-Key8555
[link] [comments]