Is memory speed everything? A quick comparison between the RTX 6000 96GB and the AMD W7800 48GB x2.

Reddit r/LocalLLaMA / 3/17/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisTools & Practical Usage

共有:

Key Points

The post compares RTX 6000 96GB (864 GB/s) with two AMD W7800 48GB GPUs (1792 GB/s total) and argues memory bandwidth is a key determinant for AI workloads.
In an empirical test using GPT-120b, the author reports 87.45 tokens/sec on ROCm vs 177.74 tokens/sec on CUDA, with the ratio roughly matching the VRAM bandwidth ratio, implying VRAM speed drives throughput.
When using three GPUs, throughput tends toward the average rather than following the slowest card, around 130-135 tokens/sec with Vulkan, reinforcing memory speed as the bottleneck.
The author concludes memory speed is practically everything for token throughput and even speculates that far higher bandwidth (e.g., 22 TB/s) could push GPT-120b to ~2000 tokens/sec, albeit at a much higher cost than a W7800.
The piece also references common questions about GPU memory size (e.g., RTX 5060ti with 16GB) and frames the bandwidth argument as the deciding factor in real-world AI workloads.

Is memory speed everything? A quick comparison between the RTX 6000 96GB and the AMD W7800 48GB x2.

I recently purchased two 48GB AMD w7800 cards. At €1,475 + VAT each, it seemed like a good deal compared to using the slower but very expensive RAM.

864GB/sec vs. 1,792GB/sec is a big difference, but with this setup, I can fit Deepseek and GLM 5 into the VRAM at about 25-30 tokens per second. More of an academic test than anything else.

Let's get to the point: I compared the tokens per second of the two cards using CUDA for the RTX 6000 and ROCm on AMD.

Using GPT120b with the same prompt on LM Studio (on llamacpp I would have had more tokens, but that's another topic):

87.45 tokens/sec ROCm

177.74 tokens/sec CUDA

If we do the ratio, we have

864/1792=0.482

87.45/177.74=0.492

This very empirical exercise clearly states that VRAM speed is practically everything, since the ratio is proportional to the speed of the VRAM itself.

I'm writing this post because I keep seeing questions about "is an RTX 5060ti with 16GB of RAM enough?" I can tell you that at 448GB/sec, it will run half as fast as a 48GB W7800 that needs 300W. The RTX 3090 24GB has 936GB/sec and will run slightly faster.

However, it's very interesting that when pairing the three cards, the speed doesn't match the slowest card, but tends toward the average. So, 130-135 tokens/sec using Vulkan.

The final suggestion is therefore to look at memory speed. If Rubin has 22TB/sec, we'll see something like 2000 tokens/sec on a GTP120b... But I'm sure it won't cost €1,475 + VAT like a W7800.

submitted by /u/LegacyRemaster
[link] [comments]

SiCウエハーは12インチ時代へ、パワー半導体以外の用途も

日経XTECH

ADICはどの種類の革新なのか ―― ドリフト監査デモで見る「事後説明」から「通過条件」への移行**

Qiita

Complete Guide: How To Make Money With Ai

Dev.to

Built a small free iOS app to reduce LLM answer uncertainty with multiple models

Dev.to

Without Valid Data, AI Transformation Is Flying Blind – Why We Need to “Grasp” Work Again

Dev.to

Is memory speed everything? A quick comparison between the RTX 6000 96GB and the AMD W7800 48GB x2.

Key Points

Related Articles

SiCウエハーは12インチ時代へ、パワー半導体以外の用途も

ADICはどの種類の革新なのか ―― ドリフト監査デモで見る「事後説明」から「通過条件」への移行**

Complete Guide: How To Make Money With Ai

Built a small free iOS app to reduce LLM answer uncertainty with multiple models

Without Valid Data, AI Transformation Is Flying Blind – Why We Need to “Grasp” Work Again

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer