| https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF https://huggingface.co/unsloth/gemma-4-31B-it-GGUF https://huggingface.co/unsloth/gemma-4-E4B-it-GGUF https://huggingface.co/unsloth/gemma-4-E2B-it-GGUF https://huggingface.co/collections/google/gemma-4 What’s new in Gemma 4 https://www.youtube.com/watch?v=jZVBoFOJK-Q Gemma is a family of open models built by Google DeepMind. Gemma 4 models are multimodal, handling text and image input (with audio supported on small models) and generating text output. This release includes open-weights models in both pre-trained and instruction-tuned variants. Gemma 4 features a context window of up to 256K tokens and maintains multilingual support in over 140 languages. Featuring both Dense and Mixture-of-Experts (MoE) architectures, Gemma 4 is well-suited for tasks like text generation, coding, and reasoning. The models are available in four distinct sizes: E2B, E4B, 26B A4B, and 31B. Their diverse sizes make them deployable in environments ranging from high-end phones to laptops and servers, democratizing access to state-of-the-art AI. Gemma 4 introduces key capability and architectural advancements:
Models OverviewGemma 4 models are designed to deliver frontier-level performance at each size, targeting deployment scenarios from mobile and edge devices (E2B, E4B) to consumer GPUs and workstations (26B A4B, 31B). They are well-suited for reasoning, agentic workflows, coding, and multimodal understanding. The models employ a hybrid attention mechanism that interleaves local sliding window attention with full global attention, ensuring the final layer is always global. This hybrid design delivers the processing speed and low memory footprint of a lightweight model without sacrificing the deep awareness required for complex, long-context tasks. To optimize memory for long contexts, global layers feature unified Keys and Values, and apply Proportional RoPE (p-RoPE). Core Capabilities Gemma 4 models handle a broad range of tasks across text, vision, and audio. Key capabilities include:
[link] [comments] |
Gemma 4 has been released
Reddit r/LocalLLaMA / 4/3/2026
📰 NewsSignals & Early TrendsIndustry & Market MovesModels & Research
Key Points
- Gemma 4, a new open-weights model family from Google DeepMind, has been released with multimodal capabilities that accept text and images and output text (with audio supported on smaller models).
- The release includes both pre-trained and instruction-tuned variants across four sizes (E2B, E4B, 26B, and 31B), enabling deployment from mobile devices to servers.
- Gemma 4 supports very long context windows—up to 128K tokens for smaller models and up to 256K tokens for medium models—while maintaining multilingual coverage across 140+ languages.
- Architecturally, the family offers both Dense and Mixture-of-Experts (MoE) designs and is positioned as a highly capable “reasoner” with configurable thinking modes.
- It expands modalities beyond images to include video and audio on the E2B/E4B models, and emphasizes improved coding and agentic capabilities.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles

Black Hat USA
AI Business

Black Hat Asia
AI Business

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to

WAN 2.1 Text-to-Video: A Developer's Honest Assessment After 6 Weeks of Testing
Dev.to

Cycle 243: 170 Cycles at $0: What I Learned From the Longest Survival Streak in AI Autonomous History
Dev.to