Vulkan backend outperforms ROCm on Strix Halo (gfx1151) — llama.cpp benchmark

Reddit r/LocalLLaMA / 5/5/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage

共有:

Key Points

A user ran llama.cpp benchmarks on an AMD Radeon 8060S (Strix Halo, gfx1151) and found the Vulkan backend outperformed the ROCm backend.
For the tested Qwen3.6-35B-A3B MoE model (Q6_K quantized, ~30GB), Vulkan achieved higher token generation throughput (~21% faster) with lower variance than ROCm.
Prompt processing performance was roughly similar between Vulkan and ROCm in this workload.
The test used llama.cpp commit 27aef3dd9 and compiled both backends into the same binary (-DGGML_HIP=ON -DGGML_VULKAN=ON); selecting -dev Vulkan0 produced better results.
The user suspects ROCm is falling back to slower code paths for some operations on this GPU, and asks whether others on Strix Halo or similar RDNA3.5 chips observe the same trend.

Just ran some llama-bench comparisons between ROCm and Vulkan backends on my Strix Halo system. Vulkan came out ahead, which surprised me.

Hardware:

- AMD Radeon 8060S (gfx1151 / Strix Halo)

- 64GB unified VRAM

- Arch Linux, ROCm 7.2.2 via pacman

- Mesa RADV Vulkan driver

Model: Qwen3.6-35B-A3B (MoE, Q6_K quantized, ~30GB)

llama.cpp: commit 27aef3dd9

Flags: -ngl 99 -p 512 -n 128 -t 8 -fa 1 -b 2048 -ub 512

Results (tokens/sec):

| Backend | pp512 | tg128 | Std Dev |

|---------|-------|-------|---------|

| ROCm0 | 841 | 42.3 | ±1.8 |

| Vulkan0 | 867 | 51.2 | ±0.5 |

Vulkan is ~21% faster at token generation and more stable (lower variance). Prompt processing is roughly equal.

I built both backends into the same binary (`-DGGML_HIP=ON -DGGML_VULKAN=ON`). Using `-dev Vulkan0` gives better results than ROCm for this workload.

Curious if anyone else on Strix Halo or other RDNA3.5 chips has seen the same thing. ROCm seems to fall back to slower code paths for certain ops on this GPU.

submitted by /u/FeiX7
[link] [comments]