Gemma 4 just casually destroyed every model on our leaderboard except Opus 4.6 and GPT-5.2. 31B params, $0.20/run

Reddit r/LocalLLaMA / 4/6/2026

💬 OpinionSignals & Early TrendsTools & Practical UsageModels & Research

Read original →

共有:

Key Points

FoodTruck Bench reported that Gemma 4 (31B) achieved 100% survival with strong profitability across 5/5 runs and delivered a reported +1,144% median ROI at $0.20 per run.
The benchmark claims Gemma 4 outperformed several major models on cost-to-performance, including GPT-5.2, Gemini 3 Pro, and Sonnet 4.6, and outpaced multiple Chinese open-source models tested.
The only model that reportedly beat Gemma 4 was Opus 4.6, but it cost $36 per run (about 180× more expensive), highlighting a potential advantage in efficiency.
The authors state they used identical configurations and simulation settings across models (same prompt, tools, seed, and model-ID verification) to support that the performance difference is attributable to the model.
They recommend trying Gemma 4 for “agentic workflows” and emphasize it as the best cost-to-performance ratio they have seen after testing 22 models so far.

Gemma 4 just casually destroyed every model on our leaderboard except Opus 4.6 and GPT-5.2. 31B params, $0.20/run

Tested Gemma 4 (31B) on our benchmark. Genuinely did not expect this.

100% survival, 5 out of 5 runs profitable, +1,144% median ROI. At $0.20 per run.

It outperforms GPT-5.2 ($4.43/run), Gemini 3 Pro ($2.95/run), Sonnet 4.6 ($7.90/run), and absolutely destroys every Chinese open-source model we've tested — Qwen 3.5 397B, Qwen 3.5 9B, DeepSeek V3.2, GLM-5. None of them even survive consistently.

The only model that beats Gemma 4 is Opus 4.6 at $36 per run. That's 180× more expensive.

31 billion parameters. Twenty cents. We double-checked the config, the prompt, the model ID — everything is identical to every other model on the leaderboard. Same seed, same tools, same simulation. It's just this good.

Strongly recommend trying it for your agentic workflows. We've tested 22 models so far and this is by far the best cost-to-performance ratio we've ever seen.

Full breakdown with charts and day-by-day analysis: foodtruckbench.com/blog/gemma-4-31b

FoodTruck Bench is an AI business simulation benchmark — the agent runs a food truck for 30 days, making decisions about location, menu, pricing, staff, and inventory. Leaderboard at foodtruckbench.com

submitted by /u/Disastrous_Theme5906
[link] [comments]