I'm shocked (Gemma 4 results)

Reddit r/LocalLLaMA / 4/5/2026

💬 OpinionSignals & Early TrendsModels & Research

Key Points

  • The post shares reported benchmark results comparing “Gemma 4 31B (think)” on a local Q4_K_M setup, showing a high score of 78.7%.
  • It places Gemini 3 Flash (think) at 76.5% and Claude Sonnet 4 (think) at 74.7%, indicating close competition among top reasoning-focused models.
  • The results also include a “no think” variant for Gemma 4 (31B) at 73.5%, suggesting a measurable performance drop when the reasoning mode is disabled.
  • An additional benchmark entry lists GPT-5.4 (Think) at 72.8%, positioning it below the leading scores in this particular table.
I'm shocked (Gemma 4 results)

https://preview.redd.it/xv1p9zp1tdtg1.png?width=1210&format=png&auto=webp&s=f4cb3b32fd977b3e6d487915de9f985329060342

https://dubesor.de/benchtable

12.Gemma 4 31B (think) in Q4_K_M local - 78.7%.

16.Gemini 3 Flash (think) - 76.5%

19.Claude Sonnet 4 (think) - 74.7%

22.Claude Sonnet 4.5 (no think) - 73.8%

24.Gemma 4 31B (no think) in Q4_K_M local - 73.5%.

29.GPT-5.4 (Think) - 72.8%

submitted by /u/Potential-Gold5298
[link] [comments]