AMD puts out new slottable GPU for AI-curious enterprises

The Register / 5/7/2026

📰 NewsDeveloper Stack & InfrastructureIndustry & Market MovesModels & Research

Key Points

  • AMD has introduced the MI350P, a slottable PCIe-based Instinct GPU aimed at enterprise customers exploring AI workloads.
  • The MI350P is built around 144GB of HBM3e memory and is rated at up to 4.6 teraFLOPS for FP4 compute.
  • AMD positions the dual-slot form factor as a practical way for enterprises to add AI acceleration without adopting fully data-center-style GPU platforms.
  • The announcement signals AMD’s continued focus on expanding AI hardware options tailored to enterprise deployment needs.

Systems

AMD puts out new slottable GPU for AI-curious enterprises

MI350P packs 144 GB of HBM3e and up to 4.6 teraFLOPS of FP4 grunt into a dual slot card

Tobias Mann Tobias Mann Systems Editor
Published

AMD hopes to win over enterprise AI customers with a more affordable datacenter GPU that can drop into conventional air-cooled servers.

Announced on Thursday, the MI350P is the House of Zen’s first PCIe-based Instinct accelerator since the MI210 debuted all the way back in 2022.

Until now, AMD’s best GPUs have only been available in packs of eight and used socketed OAM modules that weren’t compatible with most server platforms.

REG AD

By comparison, The MI350P can slot into just about any 19-inch pizza box design that offers enough power and airflow, making it a much easier sell for enterprises dipping their toes into on-prem AI for the first time.

REG AD

The 600-watt, dual-slot card is essentially a MI350X that’s been cut in half. That means the CNDA-based GPU is packing 4.6 petaFLOPS of FP4 compute and 144 GB of VRAM spread across four HBM3e stacks delivering a respectable 4 TB/s of memory bandwidth.

AMD supports configurations ranging from one to eight MI350Ps, though a lack of high-speed interconnects on these cards means it’ll be limited to PCIe 5.0 speeds (128 GB/s) for chip-to-chip communications, potentially limiting its potential in larger models.

AMD hasn’t shared pricing for the cards just yet, but at least on paper, the MI350P is well positioned to compete with either Nvidia’s H200 NVL or RTX Pro 6000 Blackwell PCIe cards.

Compared to the 141 GB H200, the MI350P promises about 38 percent higher peak performance at FP8, while eking out a narrow VRAM capacity advantage. 

But the H200 does pull ahead when it comes to memory bandwidth. With six HBM3e stacks to the MI350P’s four, the nearly two-year-old card’s memory is still about 20 percent faster.

Nvidia's H200 also supports high-speed chip-to-chip communications over NVLink, while the MI350P doesn’t use AMD’s equivalent Infinity Fabric interconnect. 

However, all this assumes you can still find H200 NVLs in the wild.

Since last summer, Nvidia has been pushing its RTX Pro 6000 Server cards on enterprise customers. As of writing, the card is Nvidia’s most powerful Blackwell-based accelerator offered in a PCIe formfactor.

REG AD

Compared to the RTX Pro 6000, the MI350P’s price starts becoming a bigger factor than performance. Workstation versions of the RTX Pro, which ditch the passive cooler for an active one, routinely sell for between $8,000 to $10,000 apiece, making it one of Nvidia’s more affordable datacenter-class GPUs.

Depending on how pricing shakes out, AMD may have to push hard to be competitive.

Having said that, the MI350P is still the better-specced part, delivering 2.3x higher peak flops, 2.5x the memory bandwidth, and 50 percent more vRAN of the RTX Pro.

AMD MI350P Nvidia H200 NVL Nvidia RTX Pro 6000 Server
BF16 1,150 TFLOPS 836 TFLOPS 500 TFLOPS
FP16 1,150 TFLOPS 836 TFLOPS 500 TFLOPS
FP8 2,300 TFLOPS 1,671 TFLOPS 1,000 TFLOPS
MXFP8 2,300 TFLOPS - 1,000 TFLOPS
MXFP4 4,600 TFLOPS - 2,000 TFLOPS
Memory Capacity 144 GB HBM3E 141 GB HBM3e 96 GB GDDR7
Memory BW 4.0 TB/s 4.8 TB/s 1.6 TB/s
GPU Instances Up to 4 @ 36GB each Up to 7 @ 16.5 GB each Up to 4 @ 24 GB each
GPU Scale-up Interconnect Not supported 2- or 4-way NVLink bridge at 900 GB/s per GPU Not supported
Product FF FHFL dual-slot Air-cooled FHFL dual-slot air-cooled FHFL dual-slot air-cooled
Max Total Board Power (TBP) 600W (450W configurable) 600W (Configurable) 600W (Configurable)
PCIe Host x16 PCIe Gen 5 at 128GB/s x16 PCIe Gen 5 at 128GB/s x16 PCIe Gen 5 at 128GB/s

Now, this all assumes peak FLOPS and memory bandwidth, which is rarely realistic. The tensors used by AI workloads are rarely the ideal shape for squeezing the maximum number of FLOPS out of a chip. This is why we run for Maximum Achievable MatMul FLOPS (MAMF) and Babel Stream memory bandwidth benchmarks as part of our AI test suite.

AMD seems to understand that peak FLOPS don’t really translate cleanly into real-world performance, and in the marketing materials shared with El Reg prior to publication, compared the MI350P’s theoretical performance against its real-world delivered performance.

MI350P Delivered (TFLOPS) Peak (TFLOPS)
BF16 713 1150
FP16 672 1150
FP8 1529 2300
MXFP8 1327 2300
MXFP6 1804 4600
MXFP4 2299 4600
Memory Capacity 144 GB HBM3E 144 GB HBM3E
Memory BW 3.6 TB/s 4.0 TB/s

It’d be nice to see Nvidia and others adopt similar practices regarding accelerator performance claims, though we suspect getting everyone to agree on the best way to measure this might not be easy.

The MI350P’s launch comes as AMD prepares to address a very different and likely more lucrative segment with its first rack-scale compute platform, codenamed Helios.

That system is due out in the second half of the year, and is aimed primarily at large hyperscale and neocloud deployments. The system packs 72 of its all-new MI455X GPUs into a single double-wide OCP rack that behaves like an enormous accelerator.

REG AD

The platform will be AMD’s first crack at Nvidia’s NVL72 racks, which launched alongside its Blackwell generation nearly two years ago. ®