My experience with the Intel Arc Pro B70 for local LLMs: Fast, but a complete mess (for now)

Reddit r/LocalLLaMA / 4/9/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage

共有:

Key Points

The author bought Intel’s Arc Pro B70 on launch and reports very strong local-LLM throughput (235 t/s with Gemma 3 27B via vLLM using 100 requests) when everything is working.
Despite the performance, they find the software stack unreliable, describing MoE (Mixture of Experts) support as only partially handled and difficult to get working.
Quantization is a major pain point: attempting AutoRound on a Gemma 4 26B variant repeatedly fails, which the author attributes to incompatibilities with the new model architecture and current tooling.
Running vLLM and Intel drivers inside containers adds substantial friction, with multiple environment-specific issues encountered while trying to make the setup stable and reproducible.
Overall, the experience is characterized as “fast but a complete mess,” and the author cautions that it is not friendly for non-expert users even though they have cloud engineering and dev-adjacent troubleshooting experience.

full disclaimer using ai to help clean up my mess of thoughts. i have a tendency of not being coherent once i get many words out.

TL;DR: Bought a B70 on launch day. Achieved an impressive 235 t/s with Gemma 3 27B on vLLM(100 requests), but the software stack is a nightmare. MoE is barely supported, quantifying new architectures is incredibly fragile, and you will fight the environment every step of the way. Definitely not for the faint of heart.

Hey everyone,

I ordered the Intel Arc Pro B70 on the 27th right when it released. I’ve previously wrestled with ROCm on my 7840HS, so my thought process was, "How much worse could it really be?" Turns out, it can be a complete mess.

To be totally fair, I have to admit that a good chunk of my pain is entirely self-inflicted. I used this hardware upgrade as an excuse to completely overhaul my environment:

OS: Moved from Ubuntu 25.10 (with a GUI) to Fedora 43 Server.

Engine: Transitioned from Ollama -> llama.cpp -> vLLM. (Intel is heavily supporting vLLM, and I’m optimizing for request density, so this seemed like a no-brainer).

Deployment: Moved everything over to containers and IaC.

I figured going the container/IaC route would make things more stable and repeatable. I’ve even been cheating my way through some of it by utilizing Claude Code to help build out my containers. But at every turn, running new models has been a massive headache.

The Good

When it actually works, the throughput is fantastic. I was able to run a Gemma 3 27B Intel AutoRound quant. Running a vLLM benchmark, I managed to generate 235 t/s across 100 requests. For a local deployment prioritizing request density, those numbers are exactly what I was hoping for.

The Bad & The Gotchas

The ecosystem just isn't ready for a frictionless experience yet:

MoE Support: Mixture of Experts models are still only partially supported and incredibly finicky.

Quantization Nightmares: I'm currently trying to run a quant through AutoRound for Gemma 4 26B. I’ve watched it blow up at least 30 times. The new architecture and dynamic attention heads just do not play nicely with the current tooling.

Container Friction: I've run into at least 7 distinct "gotchas" just trying to get the Intel drivers and vLLM to play nicely inside containerized environments.

I haven't even tried spinning up llama.cpp on this card yet, but based on the vLLM experience, I'm bracing myself.

Final Thoughts

My background is as a Cloud Engineer. I’ve spent a lot of time hosting SaaS apps across Windows and Linux environments, so while I'm not a pure developer, I am very comfortable with dev-adjacent workflows and troubleshooting infrastructure. Even with that background, getting this B70 to do what I want has been an uphill battle.

If you are looking for a plug-and-play experience, stay far away. But if you have the patience to fight the stack, the raw performance metrics are definitely there hiding under the bugs.

submitted by /u/Icy_Gur6890
[link] [comments]