Gemma4:26b's reasoning capabilities are crazy.

Reddit r/LocalLLaMA / 4/6/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Key Points

  • The author reports experimenting with Gemma4 26B MoE and says its reasoning performance is a major step up for their multi-tool agent tasks, outperforming models like Gemini-3-Flash and other local options they tested.
  • Their setup uses a Gemini SDK/agent hub with Raspberry Pi satellites for a smart-home style voice/speaker architecture, plus a Discord bot for more complex interactions and image-based tasks like connector pinouts.
  • A key benchmark (“send me my grocery list when I get to Walmart”) requires multiple tool calls (memory lookup for the right store, geocoding from address/location, grocery list retrieval, and phone notification scheduling), and the author claims Gemma4 handles it reliably while other local models often fail, especially when RAG/memory retrieval is imperfect.
  • The workflow relies on script-level optimizations to reduce token/tool input and uses planning/semantic tool injection while keeping explicit CoT off, suggesting Gemma4 still benefits from structured tool-driven “pseudo-reasoning.”
  • The author describes the interaction experience as similar to Gemini 3 Flash, but with occasional need for additional prompting rather than fully re-providing step-by-step instructions.

Been experimenting with it, first on my buddy's compute he let me borrow, and then with the Gemini SDK so that I don't need to keep stealing his macbook from 600 miles away. Originally my home agent was run through Gemini-3-Flash because no other model I've tried has been able to match it's reasoning ability.

The script(s) I have it running through are a re-implementation of a multi-speaker smart home speaker setup, with several rasperry pi zeroes functioning as speaker satellites for a central LLM hub, right now a raspberry pi 5, soon to be an M4 mac mini prepped for full local operation. It also has a dedicated discord bot I use to interact with it from my phone and PC for more complicated tasks, and those requiring information from an image, like connector pinouts I want help with.

I've been experimenting with all sorts of local models, optimizing my scripts to reduce token input from tools and RAG to allow local models to function and not get confused, but none of them have been able to keep up. My main benchmark, "send me my grocery list when I get to walmart" requires a solid 6 different tool calls to get right, between learning what walmart I mean from the memory database (especially challenging if RAG fails to pull it up), getting GPS coordinates for the relevant walmart by finding it's address and putting it into a dedicated tool that returns coordinates from an address or general location (Walmart, [CITY, STATE]), finding my grocery list within it's lists database, and setting up a phone notification event with that list, nicely formatted, for when I approach those coordinates. The only local model I was able to get to perform that task was GPT-OSS 120b, and I'll never have the hardware to run that locally. Even OSS still got confused, only successfully performing that task with a completely clean chat history. Mind you, I keep my chat history limited to 30 entries shared between user, model, and tool inputs/returns. Most of it's ability to hold a longer conversation is held through aggressive memory database updates and RAG.

Enter Gemma4, 26B MoE specifically. Handles the walmart task beautifully. Started trying other agentic tasks, research on weird stuff for my obscure project car, standalone ECU crank trigger stuff, among other topics. A lot of the work is done through dedicated planning tools to keep it fast with CoT/reasoning turned off but provide a sort of psuedo-reasoning, and my tools+semantic tool injection to try and keep it focused, but even with all that helping it, no other model family has been able to begin to handle what I've been throwing at it.

It's wild. Interacting with it feels almost exactly like interacting with 3 Flash. It's a little bit stupider in some areas, but usually to the point where it just needs a little bit more nudging, rather than full on laid out instructions on what to do to the point where I might as well do it all myself like I have to do with other models.

Just absolutely beyond impressed with it's capabilities for how small and fast it is.

submitted by /u/Mrinohk
[link] [comments]