AI Navigate

llama.cpp build b8338 adds OpenVINO backend + NPU support for prefill + kvcache

Reddit r/LocalLLaMA / 3/15/2026

📰 NewsTools & Practical UsageModels & Research

Key Points

  • llama.cpp release b8338 adds an OpenVINO backend and NPU support for prefill and kv-cache, enabling hardware acceleration for local LLM inference.
  • Intel's team contributed the work, with enthusiasm to test it on the Arc 140T iGPU.
  • The update is documented in the GitHub release page linked in the post.
  • This improvement could translate to faster, more efficient local inference for LLaMA-family models and broader hardware support for developers.

https://github.com/ggml-org/llama.cpp/releases/tag/b8338

Lots of work done by the Intel team, I'm looking forward to trying this out on the 255H with the Arc 140T iGPU

submitted by /u/stormy1one
[link] [comments]