AI Navigate

Just some qwen3.5 benchmarks for an MI60 32gb VRAM GPU - From 4b to 122b at varying quants and various context depths (0, 5000, 20000, 100000) - Performs pretty well despite its age

Reddit r/LocalLLaMA / 3/12/2026

📰 NewsTools & Practical UsageModels & Research

Key Points

  • Benchmark run on an MI60 32GB VRAM system with ROCm backend and Flash Attention enabled, detailing hardware and software setup.
  • Benchmarks span Qwen 3.5 models from 4B to 122B parameters, across varying quantizations and context depths (0, 5000, 20000, 100000).
  • Throughput results (t/s) vary by configuration, with values ranging from tens to over a thousand depending on test type and settings.
  • Conclusion highlights that the MI60 setup delivers solid performance despite its age for these AI-inference benchmarks.

llama.cpp ROCm Benchmarks – MI60 32GB VRAM

Hardware: MI60 32GB VRAM, i9-14900K, 96GB DDR5-5600
Build: 43e1cbd6c (8255)
Backend: ROCm, Flash Attention enabled

Qwen 3.5 4B Q4_K (Medium)

model size params backend ngl fa test t/s
qwen35 4B Q4_K - Medium 2.70 GiB 4.21 B ROCm 999 1 pp512 1232.35 ± 1.05
qwen35 4B Q4_K - Medium 2.70 GiB 4.21 B ROCm 999 1 tg128 49.48 ± 0.03
qwen35 4B Q4_K - Medium 2.70 GiB 4.21 B ROCm 999 1 pp512 @ d5000 1132.48 ± 2.11
qwen35 4B Q4_K - Medium 2.70 GiB 4.21 B ROCm 999 1 tg128 @ d5000 48.47 ± 0.06
qwen35 4B Q4_K - Medium 2.70 GiB 4.21 B ROCm 999 1 pp512 @ d20000 913.43 ± 1.37
qwen35 4B Q4_K - Medium 2.70 GiB 4.21 B ROCm 999 1 tg128 @ d20000 46.67 ± 0.08
qwen35 4B Q4_K - Medium 2.70 GiB 4.21 B ROCm 999 1 pp512 @ d100000 410.46 ± 1.30
qwen35 4B Q4_K - Medium 2.70 GiB 4.21 B ROCm 999 1 tg128 @ d100000 39.56 ± 0.06

Qwen 3.5 4B Q8_0

model size params backend ngl fa test t/s
qwen35 4B Q8_0 5.53 GiB 4.21 B ROCm 999 1 pp512 955.33 ± 1.66
qwen35 4B Q8_0 5.53 GiB 4.21 B ROCm 999 1 tg128 43.02 ± 0.06
qwen35 4B Q8_0 5.53 GiB 4.21 B ROCm 999 1 pp512 @ d5000 887.37 ± 2.23
qwen35 4B Q8_0 5.53 GiB 4.21 B ROCm 999 1 tg128 @ d5000 42.32 ± 0.06
qwen35 4B Q8_0 5.53 GiB 4.21 B ROCm 999 1 pp512 @ d20000 719.60 ± 1.60
qwen35 4B Q8_0 5.53 GiB 4.21 B ROCm 999 1 tg128 @ d20000 39.25 ± 0.19
qwen35 4B Q8_0 5.53 GiB 4.21 B ROCm 999 1 pp512 @ d100000 370.46 ± 1.17
qwen35 4B Q8_0 5.53 GiB 4.21 B ROCm 999 1 tg128 @ d100000 33.47 ± 0.27

Qwen 3.5 9B Q4_K (Medium)

model size params backend ngl fa test t/s
qwen35 9B Q4_K - Medium 5.55 GiB 8.95 B ROCm 999 1 pp512 767.11 ± 5.37
qwen35 9B Q4_K - Medium 5.55 GiB 8.95 B ROCm 999 1 tg128 41.23 ± 0.39
qwen35 9B Q4_K - Medium 5.55 GiB 8.95 B ROCm 999 1 pp512 @ d5000 687.61 ± 4.25
qwen35 9B Q4_K - Medium 5.55 GiB 8.95 B ROCm 999 1 tg128 @ d5000 39.08 ± 0.11
qwen35 9B Q4_K - Medium 5.55 GiB 8.95 B ROCm 999 1 pp512 @ d20000 569.65 ± 20.82
qwen35 9B Q4_K - Medium 5.55 GiB 8.95 B ROCm 999 1 tg128 @ d20000 37.58 ± 0.21
qwen35 9B Q4_K - Medium 5.55 GiB 8.95 B ROCm 999 1 pp512 @ d100000 337.25 ± 2.22
qwen35 9B Q4_K - Medium 5.55 GiB 8.95 B ROCm 999 1 tg128 @ d100000 32.25 ± 0.33

Qwen 3.5 9B Q8_0

model size params backend ngl fa test t/s
qwen35 9B Q8_0 12.07 GiB 8.95 B ROCm 999 1 pp512 578.33 ± 0.63
qwen35 9B Q8_0 12.07 GiB 8.95 B ROCm 999 1 tg128 30.25 ± 1.09
qwen35 9B Q8_0 12.07 GiB 8.95 B ROCm 999 1 pp512 @ d5000 527.08 ± 11.25
qwen35 9B Q8_0 12.07 GiB 8.95 B ROCm 999 1 tg128 @ d5000 28.38 ± 0.12
qwen35 9B Q8_0 12.07 GiB 8.95 B ROCm 999 1 pp512 @ d20000 465.11 ± 2.30
qwen35 9B Q8_0 12.07 GiB 8.95 B ROCm 999 1 tg128 @ d20000 27.38 ± 0.57
qwen35 9B Q8_0 12.07 GiB 8.95 B ROCm 999 1 pp512 @ d100000 291.10 ± 0.87
qwen35 9B Q8_0 12.07 GiB 8.95 B ROCm 999 1 tg128 @ d100000 24.80 ± 0.11

Qwen 3.5 27B Q5_K (Medium)

model size params backend ngl fa test t/s
qwen35 27B Q5_K - Medium 18.78 GiB 26.90 B ROCm 999 1 pp512 202.53 ± 1.97
qwen35 27B Q5_K - Medium 18.78 GiB 26.90 B ROCm 999 1 tg128 12.87 ± 0.27
qwen35 27B Q5_K - Medium 18.78 GiB 26.90 B ROCm 999 1 pp512 @ d5000 179.92 ± 0.40
qwen35 27B Q5_K - Medium 18.78 GiB 26.90 B ROCm 999 1 tg128 @ d5000 12.26 ± 0.03
qwen35 27B Q5_K - Medium 18.78 GiB 26.90 B ROCm 999 1 pp512 @ d20000 158.60 ± 0.74
qwen35 27B Q5_K - Medium 18.78 GiB 26.90 B ROCm 999 1 tg128 @ d20000 11.48 ± 0.06
qwen35 27B Q5_K - Medium 18.78 GiB 26.90 B ROCm 999 1 pp512 @ d100000 99.18 ± 0.66
qwen35 27B Q5_K - Medium 18.78 GiB 26.90 B ROCm 999 1 tg128 @ d100000 8.31 ± 0.07

Qwen 3.5 MoE 35B.A3B Q4_K (Medium)

model size params backend ngl fa test t/s
qwen35moe 35B.A3B Q4_K - Medium 20.70 GiB 34.66 B ROCm 999 1 pp512 851.50 ± 20.61
qwen35moe 35B.A3B Q4_K - Medium 20.70 GiB 34.66 B ROCm 999 1 tg128 40.37 ± 0.13
qwen35moe 35B.A3B Q4_K - Medium 20.70 GiB 34.66 B ROCm 999 1 pp512 @ d5000 793.63 ± 2.93
qwen35moe 35B.A3B Q4_K - Medium 20.70 GiB 34.66 B ROCm 999 1 tg128 @ d5000 39.50 ± 0.42
qwen35moe 35B.A3B Q4_K - Medium 20.70 GiB 34.66 B ROCm 999 1 pp512 @ d20000 625.67 ± 4.06
qwen35moe 35B.A3B Q4_K - Medium 20.70 GiB 34.66 B ROCm 999 1 tg128 @ d20000 39.22 ± 0.02
qwen35moe 35B.A3B Q4_K - Medium 20.70 GiB 34.66 B ROCm 999 1 pp512 @ d100000 304.23 ± 1.19
qwen35moe 35B.A3B Q4_K - Medium 20.70 GiB 34.66 B ROCm 999 1 tg128 @ d100000 36.10 ± 0.03

Qwen 3.5 MoE 35B.A3B Q6_K

model size params backend ngl fa test t/s
qwen35moe 35B.A3B Q6_K 26.86 GiB 34.66 B ROCm 999 1 pp512 855.91 ± 2.38
qwen35moe 35B.A3B Q6_K 26.86 GiB 34.66 B ROCm 999 1 tg128 40.10 ± 0.13
qwen35moe 35B.A3B Q6_K 26.86 GiB 34.66 B ROCm 999 1 pp512 @ d5000 747.68 ± 84.40
qwen35moe 35B.A3B Q6_K 26.86 GiB 34.66 B ROCm 999 1 tg128 @ d5000 39.56 ± 0.06
qwen35moe 35B.A3B Q6_K 26.86 GiB 34.66 B ROCm 999 1 pp512 @ d20000 617.59 ± 3.76
qwen35moe 35B.A3B Q6_K 26.86 GiB 34.66 B ROCm 999 1 tg128 @ d20000 38.76 ± 0.45
qwen35moe 35B.A3B Q6_K 26.86 GiB 34.66 B ROCm 999 1 pp512 @ d100000 294.08 ± 20.35
qwen35moe 35B.A3B Q6_K 26.86 GiB 34.66 B ROCm 999 1 tg128 @ d100000 35.54 ± 0.53

Lastly - A larger model than fits in my VRAM

This one I had to do a little differently as llama-bench wasn't playing well with the sharded downloads (so I actually merged them, but then I couldn't use all the flags I wanted to with llama-bench, so I just used llama-server instead and gave it a healthy prompt).

So here is the result of unsloth/Qwen3.5-122B-A10B-GGUF:Q4_K_M - a 76.5gb model

prompt eval time = 4429.15 ms / 458 tokens ( 9.67 ms per token, 103.41 tokens per second) eval time = 239847.07 ms / 3638 tokens ( 65.93 ms per token, 15.17 tokens per second) total time = 244276.22 ms / 4096 tokens slot release: id 1 | task 132 | stop processing: n_tokens = 4095, truncated = 1 srv update_slots: all slots are idle 
submitted by /u/FantasyMaster85
[link] [comments]