AI Navigate

Is memory speed everything? A quick comparison between the RTX 6000 96GB and the AMD W7800 48GB x2.

Reddit r/LocalLLaMA / 3/17/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisTools & Practical Usage

Key Points

  • The post compares RTX 6000 96GB (864 GB/s) with two AMD W7800 48GB GPUs (1792 GB/s total) and argues memory bandwidth is a key determinant for AI workloads.
  • In an empirical test using GPT-120b, the author reports 87.45 tokens/sec on ROCm vs 177.74 tokens/sec on CUDA, with the ratio roughly matching the VRAM bandwidth ratio, implying VRAM speed drives throughput.
  • When using three GPUs, throughput tends toward the average rather than following the slowest card, around 130-135 tokens/sec with Vulkan, reinforcing memory speed as the bottleneck.
  • The author concludes memory speed is practically everything for token throughput and even speculates that far higher bandwidth (e.g., 22 TB/s) could push GPT-120b to ~2000 tokens/sec, albeit at a much higher cost than a W7800.
  • The piece also references common questions about GPU memory size (e.g., RTX 5060ti with 16GB) and frames the bandwidth argument as the deciding factor in real-world AI workloads.
Is memory speed everything? A quick comparison between the RTX 6000 96GB and the AMD W7800 48GB x2.

I recently purchased two 48GB AMD w7800 cards. At €1,475 + VAT each, it seemed like a good deal compared to using the slower but very expensive RAM.

864GB/sec vs. 1,792GB/sec is a big difference, but with this setup, I can fit Deepseek and GLM 5 into the VRAM at about 25-30 tokens per second. More of an academic test than anything else.

Let's get to the point: I compared the tokens per second of the two cards using CUDA for the RTX 6000 and ROCm on AMD.

Using GPT120b with the same prompt on LM Studio (on llamacpp I would have had more tokens, but that's another topic):

87.45 tokens/sec ROCm

177.74 tokens/sec CUDA

If we do the ratio, we have

864/1792=0.482

87.45/177.74=0.492

This very empirical exercise clearly states that VRAM speed is practically everything, since the ratio is proportional to the speed of the VRAM itself.

I'm writing this post because I keep seeing questions about "is an RTX 5060ti with 16GB of RAM enough?" I can tell you that at 448GB/sec, it will run half as fast as a 48GB W7800 that needs 300W. The RTX 3090 24GB has 936GB/sec and will run slightly faster.

However, it's very interesting that when pairing the three cards, the speed doesn't match the slowest card, but tends toward the average. So, 130-135 tokens/sec using Vulkan.

The final suggestion is therefore to look at memory speed. If Rubin has 22TB/sec, we'll see something like 2000 tokens/sec on a GTP120b... But I'm sure it won't cost €1,475 + VAT like a W7800.

submitted by /u/LegacyRemaster
[link] [comments]