B70: クイックかつ初期のベンチマーク＆バックエンド比較

Reddit r/LocalLLaMA / 2026/4/4

💬 オピニオンDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage

共有:

要点

本投稿では、llama.cpp（コミット f1f793ad0）において、27BのQwen3.5 GGUFモデルをSYCL、Vulkan、OpenVINOの各バックエンドで実行した、初期ベンチマークとバックエンド比較を報告している。
SYCLの結果では、特定の構成で非常に大きなスループットが示されている（例：pp512で約~798 t/s、pp16384で約~709 t/s）。一方で、テンソル分割のバリアント（tg128/tg512）は大幅に遅く（約~15 t/s）なっている。
Vulkanは、Intel(R) Graphics BMG G31（Mesaベース）上で同モデルを実行した場合、SYCLより低いスループットとなっている。pp512は約~504 t/s、pp16384は約~449 t/sであり、tg128/tg512は依然として約~14 t/s程度に留まる。
OpenVINOについては、GPUパスでのエンドツーエンド動作がまだ確実に動作していないと説明されており、事前に確保されたテンソルがCPY操作を実行できないことを示す実行エラーが出ている。
著者はこれを「ようやく動き始める直前」と位置づけており、oneAPIランタイムの安定性、更新されたカーネル／xeファームウェア、ソースからビルドしたMesaなどの依存関係があることを挙げている。これにより、環境はまだ変動の途中であることが示唆される。

llama.cpp: f1f793ad0 (8657)

This is a quick attempt to just get it up and running. Lots of oneapi runtime still using "stable" from Intels repo. Kernel 6.19.8+deb13-amd64 with an updated xe firmware built. Vulkan is Debian but using latest Mesa compiled from source. Openvino is 2026.0. Feels like everything is "barely on the brink of working" (which is to be expected).

sycl:

$ build/bin/llama-bench -hf unsloth/Qwen3.5-27B-GGUF:UD-Q4_K_XL -p 512,16384 -n 128,512 | model | size | params | backend | ngl | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: | | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | SYCL | 99 | pp512 | 798.07 ± 2.72 | | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | SYCL | 99 | pp16384 | 708.99 ± 1.90 | | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | SYCL | 99 | tg128 | 15.64 ± 0.01 | | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | SYCL | 99 | tg512 | 15.61 ± 0.00 |

Vulkan:

$ bin/llama-bench -hf unsloth/Qwen3.5-27B-GGUF:UD-Q4_K_XL -p 512,16384 -n 128,512 ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Intel(R) Graphics (BMG G31) (Intel open-source Mesa driver) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2 | model | size | params | backend | ngl | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: | | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | Vulkan | 99 | pp512 | 504.19 ± 0.26 | | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | Vulkan | 99 | pp16384 | 448.74 ± 0.04 | | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | Vulkan | 99 | tg128 | 14.10 ± 0.01 | | qwen35 27B Q4_K - Medium | 16.40 GiB | 26.90 B | Vulkan | 99 | tg512 | 14.08 ± 0.00 |

Openvino:

$ GGML_OPENVINO_DEVICE=GPU GGML_OPENVINO_STATEFUL_EXECUTION=1 build_ov/bin/llama-bench -hf unsloth/Qwen3.5-27B-GGUF:UD-Q4_K_XL -p OpenVINO: using device GPU | model | size | params | backend | ngl | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: | /home/aaron/src/llama.cpp/ggml/src/ggml-backend.cpp:809: pre-allocated tensor (cache_r_l0 (view) (copy of )) in a buffer (OPENVINO0) that cannot run the operation (CPY) /home/aaron/src/llama.cpp/build_ov/bin/libggml-base.so.0(+0x15a25) [0x7f6183d72a25] /home/aaron/src/llama.cpp/build_ov/bin/libggml-base.so.0(ggml_print_backtrace+0x1df) [0x7f6183d72def] /home/aaron/src/llama.cpp/build_ov/bin/libggml-base.so.0(ggml_abort+0x11e) [0x7f6183d72f7e] /home/aaron/src/llama.cpp/build_ov/bin/libggml-base.so.0(+0x2cf9c) [0x7f6183d89f9c] /home/aaron/src/llama.cpp/build_ov/bin/libggml-base.so.0(ggml_backend_sched_split_graph+0xd3f) [0x7f6183d8bfbf] /home/aaron/src/llama.cpp/build_ov/bin/libllama.so.0(_ZN13llama_context13graph_reserveEjjjPK22llama_memory_context_ibPm+0x5f6) [0x7f6183ebd466] /home/aaron/src/llama.cpp/build_ov/bin/libllama.so.0(_ZN13llama_context13sched_reserveEv+0xf75) [0x7f6183ebf3f5] /home/aaron/src/llama.cpp/build_ov/bin/libllama.so.0(_ZN13llama_contextC2ERK11llama_model20llama_context_params+0xab9) [0x7f6183ec07d9] /home/aaron/src/llama.cpp/build_ov/bin/libllama.so.0(llama_init_from_model+0x11f) [0x7f6183ec155f] build_ov/bin/llama-bench(+0x309bf) [0x55fc464089bf] /lib/x86_64-linux-gnu/libc.so.6(+0x29ca8) [0x7f6183035ca8] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85) [0x7f6183035d65] build_ov/bin/llama-bench(+0x32e71) [0x55fc4640ae71] Aborted

(I swear I had this running before getting Vulkan going)

submitted by /u/abotsis
[link] [comments]