| I recently purchased two 48GB AMD w7800 cards. At €1,475 + VAT each, it seemed like a good deal compared to using the slower but very expensive RAM. 864GB/sec vs. 1,792GB/sec is a big difference, but with this setup, I can fit Deepseek and GLM 5 into the VRAM at about 25-30 tokens per second. More of an academic test than anything else. Let's get to the point: I compared the tokens per second of the two cards using CUDA for the RTX 6000 and ROCm on AMD. Using GPT120b with the same prompt on LM Studio (on llamacpp I would have had more tokens, but that's another topic): 87.45 tokens/sec ROCm 177.74 tokens/sec CUDA If we do the ratio, we have 864/1792=0.482 87.45/177.74=0.492 This very empirical exercise clearly states that VRAM speed is practically everything, since the ratio is proportional to the speed of the VRAM itself. I'm writing this post because I keep seeing questions about "is an RTX 5060ti with 16GB of RAM enough?" I can tell you that at 448GB/sec, it will run half as fast as a 48GB W7800 that needs 300W. The RTX 3090 24GB has 936GB/sec and will run slightly faster. However, it's very interesting that when pairing the three cards, the speed doesn't match the slowest card, but tends toward the average. So, 130-135 tokens/sec using Vulkan. The final suggestion is therefore to look at memory speed. If Rubin has 22TB/sec, we'll see something like 2000 tokens/sec on a GTP120b... But I'm sure it won't cost €1,475 + VAT like a W7800. [link] [comments] |
Is memory speed everything? A quick comparison between the RTX 6000 96GB and the AMD W7800 48GB x2.
Reddit r/LocalLLaMA / 3/17/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisTools & Practical Usage
Key Points
- The post compares RTX 6000 96GB (864 GB/s) with two AMD W7800 48GB GPUs (1792 GB/s total) and argues memory bandwidth is a key determinant for AI workloads.
- In an empirical test using GPT-120b, the author reports 87.45 tokens/sec on ROCm vs 177.74 tokens/sec on CUDA, with the ratio roughly matching the VRAM bandwidth ratio, implying VRAM speed drives throughput.
- When using three GPUs, throughput tends toward the average rather than following the slowest card, around 130-135 tokens/sec with Vulkan, reinforcing memory speed as the bottleneck.
- The author concludes memory speed is practically everything for token throughput and even speculates that far higher bandwidth (e.g., 22 TB/s) could push GPT-120b to ~2000 tokens/sec, albeit at a much higher cost than a W7800.
- The piece also references common questions about GPU memory size (e.g., RTX 5060ti with 16GB) and frames the bandwidth argument as the deciding factor in real-world AI workloads.
Related Articles
SiCウエハーは12インチ時代へ、パワー半導体以外の用途も
日経XTECH
ADICはどの種類の革新なのか ―― ドリフト監査デモで見る「事後説明」から「通過条件」への移行**
Qiita
Complete Guide: How To Make Money With Ai
Dev.to
Built a small free iOS app to reduce LLM answer uncertainty with multiple models
Dev.to
Without Valid Data, AI Transformation Is Flying Blind – Why We Need to “Grasp” Work Again
Dev.to