Just sharing the results from experimenting with the B70 on my setup....
These results compare three llama.cpp execution paths on the same machine:
- RTX 3090 (Vulkan) on NixOS host, using main llama.cpp repo (compiled on 4/21/2026)
- Arc Pro B70 (Vulkan) on NixOS host, using main llama.cpp repo (compiled on 4/21/2026)
- Arc Pro B70 (SYCL) inside an Ubuntu 24.04 Docker container, using a separate SYCL-enabled
llama-benchbuild from theaicss-genai/llama.cppfork
Prompt processing (pp512)
| model | RTX 3090 (Vulkan) | Arc Pro B70 (Vulkan) | Arc Pro B70 (SYCL) | B70 best vs 3090 | B70 SYCL vs B70 Vulkan |
|---|---|---|---|---|---|
| TheBloke/Llama-2-7B-GGUF:Q4_K_M | 4550.27 ± 10.90 | 1236.65 ± 3.19 | 1178.54 ± 5.74 | -72.8% | -4.7% |
| unsloth/gemma-4-E2B-it-GGUF:Q4_K_XL | 9359.15 ± 168.11 | 2302.80 ± 5.26 | 3462.19 ± 36.07 | -63.0% | +50.3% |
| unsloth/gemma-4-26B-A4B-it-GGUF:Q4_K_M | 3902.28 ± 21.37 | 1126.28 ± 6.17 | 945.89 ± 17.53 | -71.1% | -16.0% |
| unsloth/gemma-4-31B-it-GGUF:Q4_K_XL | 991.47 ± 1.73 | 295.66 ± 0.60 | 268.50 ± 0.65 | -70.2% | -9.2% |
| ggml-org/Qwen2.5-Coder-7B-Q8_0-GGUF:Q8_0 | 4740.04 ± 13.78 | 1176.34 ± 1.68 | 1192.99 ± 5.75 | -74.8% | +1.4% |
| ggml-org/Qwen3-Coder-30B-A3B-Instruct-Q8_0-GGUF:Q8_0 | oom | 990.32 ± 5.34 | 552.37 ± 5.76 | ∞ | -44.2% |
| Qwen/Qwen3-8B-GGUF:Q8_0 | 4195.89 ± 41.31 | 1048.39 ± 2.66 | 1098.90 ± 1.02 | -73.8% | +4.8% |
| unsloth/Qwen3.5-4B-GGUF:Q4_K_XL | 5233.55 ± 8.29 | 1430.72 ± 9.68 | 1767.21 ± 21.27 | -66.2% | +23.5% |
| unsloth/Qwen3.5-35B-A3B-GGUF:Q4_K_M | 3357.03 ± 18.47 | 886.39 ± 6.14 | 445.56 ± 7.46 | -73.6% | -49.7% |
| unsloth/Qwen3.6-35B-A3B-GGUF:Q4_K_M | 3417.76 ± 17.84 | 878.15 ± 5.32 | 442.01 ± 6.51 | -74.3% | -49.7% |
| Average (excluding oom) | -71.1% |
Token generation (tg128)
| model | RTX 3090 (Vulkan) | Arc Pro B70 (Vulkan) | Arc Pro B70 (SYCL) | B70 best vs 3090 | B70 SYCL vs B70 Vulkan |
|---|---|---|---|---|---|
| TheBloke/Llama-2-7B-GGUF:Q4_K_M | 137.92 ± 0.41 | 58.61 ± 0.09 | 92.39 ± 0.30 | -33.0% | +57.6% |
| unsloth/gemma-4-E2B-it-GGUF:Q4_K_XL | 207.21 ± 2.00 | 89.33 ± 0.60 | 70.65 ± 0.84 | -56.9% | -20.9% |
| unsloth/gemma-4-26B-A4B-it-GGUF:Q4_K_M | 131.33 ± 0.14 | 42.00 ± 0.01 | 37.75 ± 0.32 | -68.0% | -10.1% |
| unsloth/gemma-4-31B-it-GGUF:Q4_K_XL | 31.49 ± 0.05 | 14.49 ± 0.04 | 18.30 ± 0.05 | -41.9% | +26.3% |
| ggml-org/Qwen2.5-Coder-7B-Q8_0-GGUF:Q8_0 | 98.96 ± 0.56 | 21.30 ± 0.03 | 55.37 ± 0.02 | -44.1% | +160.0% |
| ggml-org/Qwen3-Coder-30B-A3B-Instruct-Q8_0-GGUF:Q8_0 | oom | 37.69 ± 0.03 | 28.58 ± 0.09 | ∞ | -24.2% |
| Qwen/Qwen3-8B-GGUF:Q8_0 | 92.29 ± 0.17 | 19.78 ± 0.01 | 50.74 ± 0.02 | -45.0% | +156.5% |
| unsloth/Qwen3.5-4B-GGUF:Q4_K_XL | 162.58 ± 0.76 | 60.45 ± 0.06 | 79.09 ± 0.05 | -51.4% | +30.8% |
| unsloth/Qwen3.5-35B-A3B-GGUF:Q4_K_M | 148.01 ± 0.38 | 43.30 ± 0.05 | 37.93 ± 0.89 | -70.7% | -12.4% |
| unsloth/Qwen3.6-35B-A3B-GGUF:Q4_K_M | 148.64 ± 0.53 | 43.46 ± 0.02 | 36.87 ± 0.42 | -70.8% | -15.2% |
| Average (excluding oom) | -53.5% |
Commands used
Host Vulkan runs
For each model, the host benchmark commands were:
llama-bench -hf <MODEL> -dev Vulkan0 llama-bench -hf <MODEL> -dev Vulkan2 Where:
Vulkan0= RTX 3090Vulkan2= Arc Pro B70
Container SYCL runs
For each model, the SYCL benchmark was run inside the Docker container with:
./build/bin/llama-bench -hf <MODEL> -dev SYCL0 Where:
SYCL0= Arc Pro B70
Test machine
- CPU: AMD Ryzen Threadripper 2970WX 24-Core Processor
- 24 cores / 48 threads
- 1 socket
- 2.2 GHz min / 3.0 GHz max
- RAM: 128 GiB total
- GPUs:
- NVIDIA GeForce RTX 3090, 24 GiB
- NVIDIA GeForce RTX 3090, 24 GiB
- Intel Arc Pro B70, 32 GiB
[link] [comments]

