Llama.cpp Mi50 ROCm 7 vs Vulkan Benchmarks

Reddit r/LocalLLaMA / 3/23/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Read original →

共有:

Key Points

Benchmark compares ROCm 7.13 nightly against Vulkan 1.4.341.1 on an Mi50 32GB system (EPYC 7532, Proxmox virtualization, Ubuntu Server 24.04, kernel 6.8) using llama.cpp build 8467 and llama-bench for testing.
Models tested include Qwen 3.5 9B/27B/122B and Nemotron Cascade 2, with the 122B offloaded to CPU for the -ncmoe 28 configuration (-mmp 0).
In prompt processing, Vulkan is faster for short-context runs on dense models, while ROCm wins for longer contexts and MOE models, especially with split GPU/CPU inference.
In token generation (256 tokens), the same pattern holds and MOE scenarios again favor ROCm; the build used -fa 1 and default f16 caches.

Llama.cpp Mi50 ROCm 7 vs Vulkan Benchmarks

Testing ROCm 7 using TheRock nightly tarballs against Vulkan on Mi50.

System Setup

System	Spec	Note
GPU	1x Mi50 32GB	113-D1631700-111 vbios
CPU	EPYC 7532	Proxmox virtualized 28c/56t allocated
RAM	8x16GB DDR4 2933Mhz
OS	Ubuntu Server 24.04	Kernel 6.8.0-106-generic
ROCm Version	7.13.0a20260321	TheRock Nightly Page
Vulkan	1.4.341.1
Llama.ccp Build	8467	Built using recommended commands from build wiki

Models Tested

All models run with -fa 1 and default f16 cache types using llama-bench

Model	Quant	Notes
Qwen 3.5 9B	Bartowski Q8_0
Qwen 3.5 27B	Bartowski Q8_0
Qwen 3.5 122B	Bartowski Q4_0	28 layers offloaded to CPU with -ncmoe 28, -mmp 0
Nemotron Cascade 2	mradermacher il-Q5_K_M

Prompt Processing

Vulkan at short context (sub-16k) is reliably faster than ROCm on dense-models only (Q3.5 9B and 27B). At long context on dense models or basically any context length on MOE models, ROCm is consistently faster.

Token Generation

All generations standardized at 256 tokens at varying depths. The pattern from Prompt Processing repeats here; Vulkan is faster with dense models. Speed doesn't decay with depth as much as prompt processing does. If you're using MOEs and especially split GPU/CPU inference, ROCm is faster.

Conclusions

Vulkan is the winner at short context dense models. If you're chatting and changing chats often with dense models, Vulkan wins.
ROCm is faster for anything beyond 16k context when you factor in prompt processing and generation speeds combined. Dense or MOE, doesn't matter when Vulkan prompt processing falls off a cliff. The Vulkan prompt processing numbers (not pictured but included in the full dataset below) at depth were bleak. However, read the limitations below as the nightly builds do sacrifice stability...

Limitations

TheRock's ROCm nightly builds are not a stable release. You probably will encounter weird behavior. Whether a ROCm bug or a Llama.cpp bug I am not sure, but I currently cannot run ROCm llama-server with Qwen 3.5B 27B Q8 because it keeps trying to allocate the 8192MB prompt cache to VRAM instead of system ram causing an OOM error (-cram 0 isn't disabling it, -cram 1024 doesn't lower the size, don't know why). Runs with Vulkan though.

I also noticed what seemed to be a memory leak with a different ROCm nightly from a few weeks ago and an earlier llama.cpp version, which was resolved by switching back to Vulkan. OpenCode with 100k+ context resulted in memory usage on the GPU slowly creeping up from 90% up to an OOM using Qwen Next Coder and a ROCm nightly build. I have not tried to replicate it since switching back to ROCm and the newer nightly version though.

I'm an ex-dev turned product manager just learning and doing this as a hobby though, so it's fine :)

Full data set: https://pastebin.com/4pPuGAcV

submitted by /u/JaredsBored
[link] [comments]

State of MCP Security 2026: We Scanned 15,923 AI Tools. Here's What We Found.

Dev.to

I Built a Zombie Process Killer Because Claude Code Ate 14GB of My RAM

Dev.to

Data Augmentation Using GANs

Dev.to

Building Safety Guardrails for LLM Customer Service That Actually Work in Production

Dev.to

The New AI Agent Primitive: Why Policy Needs Its Own Language (And Why YAML and Rego Fall Short)

Dev.to

Llama.cpp Mi50 ROCm 7 vs Vulkan Benchmarks

Key Points

System Setup

Models Tested

Prompt Processing

Token Generation

Conclusions

Limitations

Related Articles

State of MCP Security 2026: We Scanned 15,923 AI Tools. Here's What We Found.

I Built a Zombie Process Killer Because Claude Code Ate 14GB of My RAM

Data Augmentation Using GANs

Building Safety Guardrails for LLM Customer Service That Actually Work in Production

The New AI Agent Primitive: Why Policy Needs Its Own Language (And Why YAML and Rego Fall Short)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer