Key Points

Google DeepMind introduced four new Apache 2.0 licensed, vision-capable reasoning LLMs in the Gemma 4 family: 2B, 4B, 31B, and a 26B-A4B mixture-of-experts model.
The release emphasizes “intelligence-per-parameter,” including a technique called Per-Layer Embeddings (PLE) that targets more efficient on-device deployments by using token lookup tables per decoder layer.
A key system-card detail is that the “E” in the smaller models (E2B/E4B) refers to an “effective” parameter count, which is lower than total parameters due to the way PLE embeddings are used.
The blogger tested the models via GGUF files in LM Studio and reports that the 2B, 4B, and 26B-A4B models work, while the 31B model loops outputting “---” for prompts, suggesting a potential distribution/compatibility issue.
The post frames Gemma 4 as part of a broader early trend: rapid progress in small, capable open models and parameter-efficient design for practical deployments.

Simon Willison’s Weblog

Sponsored by: WorkOS — Ready to sell to Enterprise clients? Build and ship securely with WorkOS.

2nd April 2026 - Link Blog

Gemma 4: Byte for byte, the most capable open models. Four new vision-capable Apache 2.0 licensed reasoning LLMs from Google DeepMind, sized at 2B, 4B, 31B, plus a 26B-A4B Mixture-of-Experts.

Google emphasize "unprecedented level of intelligence-per-parameter", providing yet more evidence that creating small useful models is one of the hottest areas of research right now.

They actually label the two smaller models as E2B and E4B for "Effective" parameter size. The system card explains:

The smaller models incorporate Per-Layer Embeddings (PLE) to maximize parameter efficiency in on-device deployments. Rather than adding more layers or parameters to the model, PLE gives each decoder layer its own small embedding for every token. These embedding tables are large but are only used for quick lookups, which is why the effective parameter count is much smaller than the total.

I don't entirely understand that, but apparently that's what the "E" in E2B means!

I tried them out using the GGUFs for LM Studio. The 2B (4.41GB), 4B (6.33GB) and 26B-A4B (17.99GB) models all worked perfectly, but the 31B (19.89GB) model was broken and spat out "--- " in a loop for every prompt I tried.

The succession of pelican quality from 2B to 4B to 26B-A4B is notable:

E2B:

Two blue circles on a brown rectangle and a weird mess of orange blob and yellow triangle for the pelican

E4B:

Two black wheels joined by a sort of grey surfboard, the pelican is semicircles and a blue blob floating above it

26B-A4B:

Bicycle has the right pieces although the frame is wonky. Pelican is genuinely good, has a big triangle beak and a nice curved neck and is clearly a bird that is sitting on the bicycle

(This one actually had an SVG error - "error on line 18 at column 88: Attribute x1 redefined" - but after fixing that I got probably the best pelican I've seen yet from a model that runs on my laptop.)

Google are providing API access to the two larger Gemma models via their AI Studio. I added support to llm-gemini and then ran a pelican through the 31B model using that:

llm -m gemini/gemma-4-31b-it 'Generate an SVG of a pelican riding a bicycle'

Pretty good, though it is missing the front part of the bicycle frame:

Motion blur lines, a mostly great bicycle albeit missing the front part of the frame. Pelican is decent.

Posted 2nd April 2026 at 6:28 pm

Recent articles

Highlights from my conversation about agentic engineering on Lenny's Podcast - 2nd April 2026
Mr. Chatterbox is a (weak) Victorian-era ethically trained model you can run on your own computer - 30th March 2026
Vibe coding SwiftUI apps is a lot of fun - 27th March 2026

This is a link post by Simon Willison, posted on 2nd April 2026.

google 400 ai 1942 generative-ai 1723 local-llms 152 llms 1689 llm 583 vision-llms 85 pelican-riding-a-bicycle 102 llm-reasoning 96 gemma 13 llm-release 186 lm-studio 18

Monthly briefing

Sponsor me for $10/month and get a curated email digest of the month's most important LLM developments.

Pay me to send you less!

Sponsor & subscribe

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 4/3DailyView insight →

Gemma 4: Byte for byte, the most capable open models

Key Points

Simon Willison’s Weblog

Recent articles

Monthly briefing

💡 Insights using this article

Related Articles

Black Hat Asia

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

WAN 2.1 Text-to-Video: A Developer's Honest Assessment After 6 Weeks of Testing

Cycle 243: 170 Cycles at $0: What I Learned From the Longest Survival Streak in AI Autonomous History

How We Used Claude Code's Leaked Architecture to Transform a 9B Model Into a Production Agent

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer