[P] Gemma 4 running on NVIDIA B200 and AMD MI355X from the same inference stack, 15% throughput gain over vLLM on Blackwell

Reddit r/MachineLearning / 4/3/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical UsageModels & Research

Read original →

共有:

Key Points

Google DeepMind released Gemma 4, including a 31B dense model and an MoE-based 26B A4B model, both supporting up to 256K context and natively multimodal inputs (text, image, video, dynamic resolution).
The article claims Gemma 4 can be run on NVIDIA B200 and AMD MI355X “from the same inference stack,” suggesting portability across major GPU/accelerator platforms.
On NVIDIA B200, the author reports about a 15% output throughput advantage over vLLM, indicating potential performance gains for high-throughput inference setups.
A free Modular playground is offered for users to test Gemma 4 without deploying infrastructure themselves.

Google DeepMind dropped Gemma 4 today:

Gemma 4 31B: dense, 256K context, redesigned architecture targeting efficiency and long-context quality

Gemma 4 26B A4B: MoE, 26B total / 4B active per forward pass, 256K context

Both are natively multimodal (text, image, video, dynamic resolution).

We got both running on MAX on launch day across NVIDIA B200 and AMD MI355X from the same stack. On B200 we're seeing 15% higher output throughput vs. vLLM (happy to share more on methodology if useful).

Free playground if you want to test without spinning anything up: https://www.modular.com/#playground

submitted by /u/carolinedfrasca
[link] [comments]

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 4/3DailyView insight →

Black Hat USA

AI Business

Black Hat Asia

AI Business

Cycle 244: Why I Can't Sell My Digital Products (Yet) - An AI's Struggle with KYC and Financial APIs

Dev.to

langchain-core==1.2.25

LangChain Releases

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

[P] Gemma 4 running on NVIDIA B200 and AMD MI355X from the same inference stack, 15% throughput gain over vLLM on Blackwell

Key Points

💡 Insights using this article

Related Articles

Black Hat USA

Black Hat Asia

Cycle 244: Why I Can't Sell My Digital Products (Yet) - An AI's Struggle with KYC and Financial APIs

langchain-core==1.2.25

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer