Google speeds up Gemma 4 threefold with multi-token prediction

THE DECODER / 5/7/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

共有:

Key Points

Google has introduced multi-token prediction “drafters” for the open Gemma 4 model family, improving text generation speed by up to three times.
A smaller auxiliary model proposes multiple candidate tokens at once, reducing the number of generation steps.
The main Gemma 4 model verifies the proposed tokens in a single pass, maintaining output quality while accelerating inference.
The update is aimed at making open-model text generation more efficient for real-world deployments.

Google has released multi-token prediction drafters for its Gemma 4 open model family that speed up text generation by up to three times. A small auxiliary model suggests several tokens at once while the main model checks them in a single pass.

The article Google speeds up Gemma 4 threefold with multi-token prediction appeared first on The Decoder.