| 12.Gemma 4 31B (think) in Q4_K_M local - 78.7%. 16.Gemini 3 Flash (think) - 76.5% 19.Claude Sonnet 4 (think) - 74.7% 22.Claude Sonnet 4.5 (no think) - 73.8% 24.Gemma 4 31B (no think) in Q4_K_M local - 73.5%. 29.GPT-5.4 (Think) - 72.8% [link] [comments] |
I'm shocked (Gemma 4 results)
Reddit r/LocalLLaMA / 4/5/2026
💬 OpinionSignals & Early TrendsModels & Research
Key Points
- The post shares reported benchmark results comparing “Gemma 4 31B (think)” on a local Q4_K_M setup, showing a high score of 78.7%.
- It places Gemini 3 Flash (think) at 76.5% and Claude Sonnet 4 (think) at 74.7%, indicating close competition among top reasoning-focused models.
- The results also include a “no think” variant for Gemma 4 (31B) at 73.5%, suggesting a measurable performance drop when the reasoning mode is disabled.
- An additional benchmark entry lists GPT-5.4 (Think) at 72.8%, positioning it below the leading scores in this particular table.
Related Articles

Black Hat Asia
AI Business

Lainux -- The Secure OS for AI Builders
Dev.to

The Harness is All You Need
Dev.to

The Rise of the AI-Native Account Executive: What Top Infrastructure Companies are Looking For
Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to