Gemma 4 MoE hitting 120 TPS on Dual 3090s!

Reddit r/LocalLLaMA / 4/4/2026

💬 OpinionSignals & Early TrendsTools & Practical UsageModels & Research

Key Points

  • A Reddit post reports benchmarking Gemma 4 in a Mixture-of-Experts (MoE) setup, claiming roughly 120 tokens per second using dual NVIDIA RTX 3090 GPUs.

Thought I'd share some benchmark numbers from my local setup.

Hardware: Dual NVIDIA RTX 3090s Model: Gemma 4 (MoE architecture) Performance: ~120 Tokens Per Second

The efficiency of this MoE implementation is unreal. Even with a heavy load, the throughput stays incredibly consistent. It's a massive upgrade for anyone running local LLMs for high-frequency tasks or complex agentic workflows.

The speed allows for near-instantaneous reasoning, which is a total paradigm shift compared to older dense models. If you have the VRAM to spare, this is definitely the way to go.

submitted by /u/AaZzEL
[link] [comments]