Gemma-4 26B-A4B + Opencode on M5 MacBook is *actually good*

Reddit r/LocalLLaMA / 4/3/2026

💬 OpinionSignals & Early TrendsTools & Practical UsageModels & Research

Key Points

  • A user reports that the Gemma-4 26B-A4B quantized model runs well on a 32GB M5 MacBook Air, citing strong prompt throughput and low heat/noise while in low-power mode.
  • The same setup is described as materially better for practical/agentic coding workflows than prior laptop experiences (M1 Max and an M4 Mac mini), particularly due to faster prompt processing and improved battery life.
  • Battery performance is reported as significantly improved on the M5 (up to ~6 hours expected under load vs. ~2 hours on the older M1 Max) making on-the-go coding more feasible.
  • The user tests Opencode from the laptop and concludes it is “usable,” but not yet a full drop-in replacement for closed-source coding assistants like Claude Code without additional user hand-holding.
  • The article frames Gemma-4 as a capable open alternative that may work best for users comfortable managing the extra configuration compared with frontier proprietary models.

TL;DR, 32gb M5 MacBook Air can run gemma-4-26B-A4B-it-UD-IQ4_XS at 300t/s PP and 12t/s generation (running in low power mode, uses 8W, making it the first laptop I've used to not get warm and noisy whilst running LLMs). Fast prompt processing + short thinking traces + can actually handle agentic behaviour = Opencode is actually usable from my laptop!

--

Previously I've been running LLMs off my M1 Max 64gb. And whilst it's been good enough for tinkering and toy use cases, it's never really been great for running anything that requires longer context... i.e. it could be useful as a simple chatbot but not much else. Making a single Snake game in Python was fine, but anything where I might want to do agentic coding / contribute to a larger codebase has always been a bit janky. And unless I artificially throttled generation speeds, anything I did would still chug at my battery - even on low power mode I'd get ~2 hours of AI usage away from the wall at most.

I did also get an M4 Mac Mini 16gb which was meant to be kind of an at-home server. But at that little RAM I was obviously limited to only pretty tiny models, and even then, the prompt processing speeds weren't anything to write home about lol

My M5 32gb on the other hand is actually really zippy with prompt processing (thank you new matmul cores!). It can get up to ~25% faster prompt processing speeds than my M1 Max even when the Max is not in power saving mode, and the base M5 really does sip at its battery in comparison - even if I run Opencode at full tilt the whole time, from my tests so far on battery saver I'd expect to get about ~6 hours of usage versus ~2 on the M1 Max, and that's with a smaller total battery size (70Wh vs 53.8Wh)! Which is great - I don't have to worry anymore about whether or not I'll actually be close enough to a plug if I go to a coffee shop, or if my battery will last the length of a longer train commute. Which are also the same sorts of times I'd be worried about my internet connection being too spotty to use something like Claude Code anyhow.

Now, the big question: is it good enough to replace Claude Code (and also Antigravity - I use both)?

I don't think anyone will be surprised that, no, lol, definitely not from my tests so far 😂

Don't get me wrong, it is actually pretty capable! And I don't think anyone was expecting that it'd replace closed source models in all scenarios. And actually, I'd rather use Gemma-4-26B than go back to a year ago when I would run out of Gemini-2.5-Pro allowance in Cursor and be forced to use Gemini-2.5-Flash. But Gemma-4 does (unsurprisingly) need far more hand-holding than current closed-source frontier models do from my experience. And whilst I'm sure some people will appreciate it, my opinion so far is that it's also kinda dry in its responses - not sure if it's because of Opencode's prompt or it just being Gemma-4's inherent way of speaking... but the best way I can describe it is that in terms of dry communication style, Gemma-4 | Opencode is to Claude | Claude Code what it is to Gemini-3.1-Pro | Antigravity. And I'm definitely much more of a Gemini-enjoyer lol

But yeah, honestly actually crazy to thank that this sort of agentic coding was cutting-edge / not even really possible with frontier models back at the end of 2024. And now I'm running it from a laptop so tiny that I can slip it in a tote bag and take it just about anywhere 😂

submitted by /u/maddie-lovelace
[link] [comments]