Gemma 4 26b is the perfect all around local model and I'm surprised how well it does.

Reddit r/LocalLLaMA / 4/5/2026

💬 OpinionSignals & Early TrendsTools & Practical Usage

Key Points

  • A user testing local LLMs on a 64GB RAM Mac found that Gemma 4 26B was fast and consistently able to generate a working Doom-style raycaster in HTML/JavaScript after only a few prompts.
  • Compared with Qwen 3 Coder (especially 4-bit), the user reports Gemma 4 was less likely to overload the system, less prone to tool-call parameter loops, and more reliable at completing the task.
  • The user also reports that Qwen 3.5 (around 30B MoE) failed to complete the same test, getting stuck in extended “thinking” loops and repeatedly rewriting the same file without finishing.
  • The overall takeaway is a strong impression that Gemma 4 26B is a practical “all-around” local model for coding workflows, suggesting improving competitiveness of local models over time.
  • The post is framed as experiential/benchmark-by-task rather than a formal evaluation, focusing on responsiveness, stability, and code completion on consumer hardware.

I got a 64gb memory mac about a month ago and I've been trying to find a model that is reasonably quick, decently good at coding, and doesn't overload my system. My test I've been running is having it create a doom style raycaster in html and js

I've been told qwen 3 coder next was the king, and while its good, the 4bit variant always put my system near the edge. Also I don't know if it was because it was the 4bit variant, but it always would miss tool uses and get stuck in a loop guessing the right params. In the doom test it would usually get it and make something decent, but not after getting stuck in a loop of bad tool calls for a while.

Qwen 3.5 (the near 30b moe variant) could never do it in my experience. It always got stuck on a thinking loop and then would become so unsure of itself it would just end up rewriting the same file over and over and never finish.

But gemma 4 just crushed it, making something working after only 3 prompts. It was very fast too. It also limited its thinking and didn't get too lost in details, it just did it. It's the first time I've ran a local model and been actually surprised that it worked great, without any weirdness.

It makes me excited about the future of local models, and I wouldn't be surprised if in 2-3 years we'll be able to use very capable local models that can compete with the sonnets of the world.

submitted by /u/pizzaisprettyneato
[link] [comments]