Turns out Gemma 4 had MTP (multi token prediction) all along

Reddit r/LocalLLaMA / 4/7/2026

💬 OpinionSignals & Early TrendsTools & Practical UsageModels & Research

共有:

Key Points

A developer integrating Gemma 4 via the LiteRT API observed runtime errors related to “mtp weights being an incompatible tensor shape” on a Google Pixel 9 device.
Investigation suggested Gemma 4’s LiteRT package includes additional multi-token prediction (MTP) heads intended for speculative decoding and faster text generation.
The post claims a Google employee confirmed that Gemma 4 does have MTP, but it was “removed on purpose” to improve compatibility and broad usability across deployments.
The author speculates that reverse engineering LiteRT tensors and the compute graph might enable users to recover/repurpose the MTP functionality for faster outputs.

Turns out Gemma 4 had MTP (multi token prediction) all along

Hey Everyone, While I was trying to utilize Gemma 4 through the LiteRT api in my android app, I noticed that Gemma 4 was throwing errors when loading it on my Google Pixel 9 test device of the "mtp weights being an incompatible tensor shape". I did some digging and found out there's additional MTP prediction heads within the LiteRT files for speculative decoding and much faster outputs.

Well turns out I got confirmation today from a Google employee that Gemma 4 DOES INDEED have MTP but it was "removed on purpose" for "ensuring compatibility and broad usability".

Well would've been great to be honest if they released the full model instead, considering we already didn't get the Gemma 124B model leaked in Jeff Dean's tweet by accident. Would've been great to have much faster Gemma 4 generation outputs, ideally on the already fast MoE. Maybe someone can reverse engineer and extract the tensors and the math based on the compute graph in LiteRT?

Here's a link to the conversation:

https://huggingface.co/google/gemma-4-E4B-it/discussions/5

submitted by /u/Electrical-Monitor27
[link] [comments]

Black Hat USA

AI Business

Black Hat Asia

AI Business

Vector Databases for AI Apps: Pinecone vs pgvector vs Weaviate

Dev.to

How to Build a Free Crypto Portfolio Tracker with AI Alerts (No Coding Required)

Dev.to

Can You Really Trust AI Anonymizers? Governments Are Changing the Rules

Dev.to

Turns out Gemma 4 had MTP (multi token prediction) all along

Key Points

Related Articles

Black Hat USA

Black Hat Asia

Vector Databases for AI Apps: Pinecone vs pgvector vs Weaviate

How to Build a Free Crypto Portfolio Tracker with AI Alerts (No Coding Required)

Can You Really Trust AI Anonymizers? Governments Are Changing the Rules

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer