Qwen3.5 vs Gemma 4: Benchmarks vs real world use?

Reddit r/LocalLLaMA / 4/3/2026

💬 OpinionSignals & Early TrendsTools & Practical UsageModels & Research

Key Points

  • A user reports local testing of Gemma 4 2B on an older RTX 2060 (6GB VRAM) and intensive prior use of Qwen3.5 across sizes in customer projects.
  • The user claims Gemma 4 2B is faster, uses less memory, and produces better structured outputs, including improved Mermaid chart generation.
  • They describe Gemma 4 2B as more “agentic” and overall more capable in real-world behavior, saying it feels closer to Qwen3.5 9B than the smaller model size suggests.
  • The post raises doubts about how benchmark results are interpreted, speculating that Qwen3.5 may be “bench-maxed” or that Google may be downplaying Gemma 4’s real-world performance.
  • Overall, the discussion emphasizes that benchmark scores may not fully predict practical outcomes like speed, formatting quality, and agent-like interaction.

Just tested Gemma 4 2B locally on old rtx2060 6GB VRAM and used Qwen3.5 in all sizes intensively, in customer projects before.

First impression from Gemma 4 2B: It's better, faster, uses less memory than q3.5 2B. More agentic, better mermaid charts, better chat output, better structured output.

It seems like either q3.5 are benchmaxed (although they really were much better than the competition) or google is playing it down. Gemma 4 2B "seems" / "feels" more like Q3.5 9B to me.

submitted by /u/AppealSame4367
[link] [comments]