Just ran some llama-bench comparisons between ROCm and Vulkan backends on my Strix Halo system. Vulkan came out ahead, which surprised me.
Hardware:
- AMD Radeon 8060S (gfx1151 / Strix Halo)
- 64GB unified VRAM
- Arch Linux, ROCm 7.2.2 via pacman
- Mesa RADV Vulkan driver
Model: Qwen3.6-35B-A3B (MoE, Q6_K quantized, ~30GB)
llama.cpp: commit 27aef3dd9
Flags: -ngl 99 -p 512 -n 128 -t 8 -fa 1 -b 2048 -ub 512
Results (tokens/sec):
| Backend | pp512 | tg128 | Std Dev |
|---------|-------|-------|---------|
| ROCm0 | 841 | 42.3 | ±1.8 |
| Vulkan0 | 867 | 51.2 | ±0.5 |
Vulkan is ~21% faster at token generation and more stable (lower variance). Prompt processing is roughly equal.
I built both backends into the same binary (`-DGGML_HIP=ON -DGGML_VULKAN=ON`). Using `-dev Vulkan0` gives better results than ROCm for this workload.
Curious if anyone else on Strix Halo or other RDNA3.5 chips has seen the same thing. ROCm seems to fall back to slower code paths for certain ops on this GPU.
[link] [comments]



