llama.cpp ROCm Benchmarks – MI60 32GB VRAM
Hardware: MI60 32GB VRAM, i9-14900K, 96GB DDR5-5600
Build: 43e1cbd6c (8255)
Backend: ROCm, Flash Attention enabled
Qwen 3.5 4B Q4_K (Medium)
| model | size | params | backend | ngl | fa | test | t/s |
| qwen35 4B Q4_K - Medium | 2.70 GiB | 4.21 B | ROCm | 999 | 1 | pp512 | 1232.35 ± 1.05 |
| qwen35 4B Q4_K - Medium | 2.70 GiB | 4.21 B | ROCm | 999 | 1 | tg128 | 49.48 ± 0.03 |
| qwen35 4B Q4_K - Medium | 2.70 GiB | 4.21 B | ROCm | 999 | 1 | pp512 @ d5000 | 1132.48 ± 2.11 |
| qwen35 4B Q4_K - Medium | 2.70 GiB | 4.21 B | ROCm | 999 | 1 | tg128 @ d5000 | 48.47 ± 0.06 |
| qwen35 4B Q4_K - Medium | 2.70 GiB | 4.21 B | ROCm | 999 | 1 | pp512 @ d20000 | 913.43 ± 1.37 |
| qwen35 4B Q4_K - Medium | 2.70 GiB | 4.21 B | ROCm | 999 | 1 | tg128 @ d20000 | 46.67 ± 0.08 |
| qwen35 4B Q4_K - Medium | 2.70 GiB | 4.21 B | ROCm | 999 | 1 | pp512 @ d100000 | 410.46 ± 1.30 |
| qwen35 4B Q4_K - Medium | 2.70 GiB | 4.21 B | ROCm | 999 | 1 | tg128 @ d100000 | 39.56 ± 0.06 |
Qwen 3.5 4B Q8_0
| model | size | params | backend | ngl | fa | test | t/s |
| qwen35 4B Q8_0 | 5.53 GiB | 4.21 B | ROCm | 999 | 1 | pp512 | 955.33 ± 1.66 |
| qwen35 4B Q8_0 | 5.53 GiB | 4.21 B | ROCm | 999 | 1 | tg128 | 43.02 ± 0.06 |
| qwen35 4B Q8_0 | 5.53 GiB | 4.21 B | ROCm | 999 | 1 | pp512 @ d5000 | 887.37 ± 2.23 |
| qwen35 4B Q8_0 | 5.53 GiB | 4.21 B | ROCm | 999 | 1 | tg128 @ d5000 | 42.32 ± 0.06 |
| qwen35 4B Q8_0 | 5.53 GiB | 4.21 B | ROCm | 999 | 1 | pp512 @ d20000 | 719.60 ± 1.60 |
| qwen35 4B Q8_0 | 5.53 GiB | 4.21 B | ROCm | 999 | 1 | tg128 @ d20000 | 39.25 ± 0.19 |
| qwen35 4B Q8_0 | 5.53 GiB | 4.21 B | ROCm | 999 | 1 | pp512 @ d100000 | 370.46 ± 1.17 |
| qwen35 4B Q8_0 | 5.53 GiB | 4.21 B | ROCm | 999 | 1 | tg128 @ d100000 | 33.47 ± 0.27 |
Qwen 3.5 9B Q4_K (Medium)
| model | size | params | backend | ngl | fa | test | t/s |
| qwen35 9B Q4_K - Medium | 5.55 GiB | 8.95 B | ROCm | 999 | 1 | pp512 | 767.11 ± 5.37 |
| qwen35 9B Q4_K - Medium | 5.55 GiB | 8.95 B | ROCm | 999 | 1 | tg128 | 41.23 ± 0.39 |
| qwen35 9B Q4_K - Medium | 5.55 GiB | 8.95 B | ROCm | 999 | 1 | pp512 @ d5000 | 687.61 ± 4.25 |
| qwen35 9B Q4_K - Medium | 5.55 GiB | 8.95 B | ROCm | 999 | 1 | tg128 @ d5000 | 39.08 ± 0.11 |
| qwen35 9B Q4_K - Medium | 5.55 GiB | 8.95 B | ROCm | 999 | 1 | pp512 @ d20000 | 569.65 ± 20.82 |
| qwen35 9B Q4_K - Medium | 5.55 GiB | 8.95 B | ROCm | 999 | 1 | tg128 @ d20000 | 37.58 ± 0.21 |
| qwen35 9B Q4_K - Medium | 5.55 GiB | 8.95 B | ROCm | 999 | 1 | pp512 @ d100000 | 337.25 ± 2.22 |
| qwen35 9B Q4_K - Medium | 5.55 GiB | 8.95 B | ROCm | 999 | 1 | tg128 @ d100000 | 32.25 ± 0.33 |
Qwen 3.5 9B Q8_0
| model | size | params | backend | ngl | fa | test | t/s |
| qwen35 9B Q8_0 | 12.07 GiB | 8.95 B | ROCm | 999 | 1 | pp512 | 578.33 ± 0.63 |
| qwen35 9B Q8_0 | 12.07 GiB | 8.95 B | ROCm | 999 | 1 | tg128 | 30.25 ± 1.09 |
| qwen35 9B Q8_0 | 12.07 GiB | 8.95 B | ROCm | 999 | 1 | pp512 @ d5000 | 527.08 ± 11.25 |
| qwen35 9B Q8_0 | 12.07 GiB | 8.95 B | ROCm | 999 | 1 | tg128 @ d5000 | 28.38 ± 0.12 |
| qwen35 9B Q8_0 | 12.07 GiB | 8.95 B | ROCm | 999 | 1 | pp512 @ d20000 | 465.11 ± 2.30 |
| qwen35 9B Q8_0 | 12.07 GiB | 8.95 B | ROCm | 999 | 1 | tg128 @ d20000 | 27.38 ± 0.57 |
| qwen35 9B Q8_0 | 12.07 GiB | 8.95 B | ROCm | 999 | 1 | pp512 @ d100000 | 291.10 ± 0.87 |
| qwen35 9B Q8_0 | 12.07 GiB | 8.95 B | ROCm | 999 | 1 | tg128 @ d100000 | 24.80 ± 0.11 |
Qwen 3.5 27B Q5_K (Medium)
| model | size | params | backend | ngl | fa | test | t/s |
| qwen35 27B Q5_K - Medium | 18.78 GiB | 26.90 B | ROCm | 999 | 1 | pp512 | 202.53 ± 1.97 |
| qwen35 27B Q5_K - Medium | 18.78 GiB | 26.90 B | ROCm | 999 | 1 | tg128 | 12.87 ± 0.27 |
| qwen35 27B Q5_K - Medium | 18.78 GiB | 26.90 B | ROCm | 999 | 1 | pp512 @ d5000 | 179.92 ± 0.40 |
| qwen35 27B Q5_K - Medium | 18.78 GiB | 26.90 B | ROCm | 999 | 1 | tg128 @ d5000 | 12.26 ± 0.03 |
| qwen35 27B Q5_K - Medium | 18.78 GiB | 26.90 B | ROCm | 999 | 1 | pp512 @ d20000 | 158.60 ± 0.74 |
| qwen35 27B Q5_K - Medium | 18.78 GiB | 26.90 B | ROCm | 999 | 1 | tg128 @ d20000 | 11.48 ± 0.06 |
| qwen35 27B Q5_K - Medium | 18.78 GiB | 26.90 B | ROCm | 999 | 1 | pp512 @ d100000 | 99.18 ± 0.66 |
| qwen35 27B Q5_K - Medium | 18.78 GiB | 26.90 B | ROCm | 999 | 1 | tg128 @ d100000 | 8.31 ± 0.07 |
Qwen 3.5 MoE 35B.A3B Q4_K (Medium)
| model | size | params | backend | ngl | fa | test | t/s |
| qwen35moe 35B.A3B Q4_K - Medium | 20.70 GiB | 34.66 B | ROCm | 999 | 1 | pp512 | 851.50 ± 20.61 |
| qwen35moe 35B.A3B Q4_K - Medium | 20.70 GiB | 34.66 B | ROCm | 999 | 1 | tg128 | 40.37 ± 0.13 |
| qwen35moe 35B.A3B Q4_K - Medium | 20.70 GiB | 34.66 B | ROCm | 999 | 1 | pp512 @ d5000 | 793.63 ± 2.93 |
| qwen35moe 35B.A3B Q4_K - Medium | 20.70 GiB | 34.66 B | ROCm | 999 | 1 | tg128 @ d5000 | 39.50 ± 0.42 |
| qwen35moe 35B.A3B Q4_K - Medium | 20.70 GiB | 34.66 B | ROCm | 999 | 1 | pp512 @ d20000 | 625.67 ± 4.06 |
| qwen35moe 35B.A3B Q4_K - Medium | 20.70 GiB | 34.66 B | ROCm | 999 | 1 | tg128 @ d20000 | 39.22 ± 0.02 |
| qwen35moe 35B.A3B Q4_K - Medium | 20.70 GiB | 34.66 B | ROCm | 999 | 1 | pp512 @ d100000 | 304.23 ± 1.19 |
| qwen35moe 35B.A3B Q4_K - Medium | 20.70 GiB | 34.66 B | ROCm | 999 | 1 | tg128 @ d100000 | 36.10 ± 0.03 |
Qwen 3.5 MoE 35B.A3B Q6_K
| model | size | params | backend | ngl | fa | test | t/s |
| qwen35moe 35B.A3B Q6_K | 26.86 GiB | 34.66 B | ROCm | 999 | 1 | pp512 | 855.91 ± 2.38 |
| qwen35moe 35B.A3B Q6_K | 26.86 GiB | 34.66 B | ROCm | 999 | 1 | tg128 | 40.10 ± 0.13 |
| qwen35moe 35B.A3B Q6_K | 26.86 GiB | 34.66 B | ROCm | 999 | 1 | pp512 @ d5000 | 747.68 ± 84.40 |
| qwen35moe 35B.A3B Q6_K | 26.86 GiB | 34.66 B | ROCm | 999 | 1 | tg128 @ d5000 | 39.56 ± 0.06 |
| qwen35moe 35B.A3B Q6_K | 26.86 GiB | 34.66 B | ROCm | 999 | 1 | pp512 @ d20000 | 617.59 ± 3.76 |
| qwen35moe 35B.A3B Q6_K | 26.86 GiB | 34.66 B | ROCm | 999 | 1 | tg128 @ d20000 | 38.76 ± 0.45 |
| qwen35moe 35B.A3B Q6_K | 26.86 GiB | 34.66 B | ROCm | 999 | 1 | pp512 @ d100000 | 294.08 ± 20.35 |
| qwen35moe 35B.A3B Q6_K | 26.86 GiB | 34.66 B | ROCm | 999 | 1 | tg128 @ d100000 | 35.54 ± 0.53 |
Lastly - A larger model than fits in my VRAM
This one I had to do a little differently as llama-bench wasn't playing well with the sharded downloads (so I actually merged them, but then I couldn't use all the flags I wanted to with llama-bench, so I just used llama-server instead and gave it a healthy prompt).
So here is the result of unsloth/Qwen3.5-122B-A10B-GGUF:Q4_K_M - a 76.5gb model
prompt eval time = 4429.15 ms / 458 tokens ( 9.67 ms per token, 103.41 tokens per second) eval time = 239847.07 ms / 3638 tokens ( 65.93 ms per token, 15.17 tokens per second) total time = 244276.22 ms / 4096 tokens slot release: id 1 | task 132 | stop processing: n_tokens = 4095, truncated = 1 srv update_slots: all slots are idle
submitted by
/u/FantasyMaster85 [link] [comments]