Gemma 4 MTP released

Reddit r/LocalLLaMA / 5/6/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Read original →

共有:

Key Points

Google has released Gemma 4 Multi-Token Prediction (MTP) draft models that extend Gemma 4 with a smaller, faster draft model for speculative decoding.
In an MTP speculative decoding pipeline, the draft model predicts multiple tokens ahead and the target model verifies them in parallel.
The approach provides up to 2× decoding speedups while preserving the exact same output quality as standard generation.
The released Hugging Face checkpoints include multiple sizes for the Gemma 4 family (e.g., 31B, 26B, E4B, E2B), positioned for low-latency and on-device deployments.
These MTP artifacts are provided specifically for MTP “drafters” via model cards, indicating they are intended to be plugged into speculative decoding systems rather than used as standalone chat models.

Blog post:

https://blog.google/innovation-and-ai/technology/developers-tools/multi-token-prediction-gemma-4/

MTP draft models:

https://huggingface.co/google/gemma-4-31B-it-assistant

https://huggingface.co/google/gemma-4-26B-A4B-it-assistant

https://huggingface.co/google/gemma-4-E4B-it-assistant

https://huggingface.co/google/gemma-4-E2B-it-assistant

This model card is for the Multi-Token Prediction (MTP) drafters for the Gemma 4 models. MTP is implemented by extending the base model with a smaller, faster draft model. When used in a Speculative Decoding pipeline, the draft model predicts several tokens ahead, which the target model then verifies in parallel. This results in significant decoding speedups (up to 2x) while guaranteeing the exact same quality as standard generation, making these checkpoints perfect for low-latency and on-device applications.

submitted by /u/rerri
[link] [comments]

Black Hat USA

AI Business

Transform Your Blurry Photos into HD Masterpieces, Instantly!

Dev.to

6 New Moats for AI Agent Infrastructure — Trust Score, Deployment, SLA, Identity, Compliance-as-Code

Dev.to

Google Home’s Gemini AI can handle more complicated requests

The Verge

Exit Code 2: How Claude Hooks Turn Agentic Rules Into Runtime Barriers

Dev.to

Gemma 4 MTP released

Key Points

Related Articles

Black Hat USA

Transform Your Blurry Photos into HD Masterpieces, Instantly!

6 New Moats for AI Agent Infrastructure — Trust Score, Deployment, SLA, Identity, Compliance-as-Code

Google Home’s Gemini AI can handle more complicated requests

Exit Code 2: How Claude Hooks Turn Agentic Rules Into Runtime Barriers

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Key Points

Related Articles

Black Hat USA

Transform Your Blurry Photos into HD Masterpieces, Instantly!

6 New Moats for AI Agent Infrastructure — Trust Score, Deployment, SLA, Identity, Compliance-as-Code

Google Home&#8217;s Gemini AI can handle more complicated requests

Exit Code 2: How Claude Hooks Turn Agentic Rules Into Runtime Barriers

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Google Home’s Gemini AI can handle more complicated requests