Update on Gemma 4 having MTP: Reverse engineering effort

Reddit r/LocalLLaMA / 4/10/2026

💬 OpinionSignals & Early TrendsTools & Practical UsageModels & Research

共有:

Key Points

A community update claims Gemma 4 includes MTP (multi-token prediction) and that the author has extracted the model weights from the shipped TFLite artifacts.
The next milestone is reverse-engineering the MTP logic from compiled TFLite graph files back into a working PyTorch `nn.Module`, with a request for C++ expertise to interpret the graph implementation.
The extracted model appears to be INT8-quantized, and the author suggests it may be recoverable via de-quantization if Google used QAT (quantization-aware training).
The effort points to using Google’s AI Edge Model Explorer to help inspect/understand the TFLite/graph structure and references prior Gemini Nano extraction/conversion work as a potential guide.
The author has published a Hugging Face repo with extraction outputs, replication steps, and a GraphDef JSON that might be used alongside an LLM to aid reverse engineering.

Update on Gemma 4 having MTP: Reverse engineering effort

Hey Everyone

In a previous post I had mentioned I found out Gemma 4 has MTP. Turns out I was able to extract the model weights, but now I need help from the community, especially people who know C++ to help reverse engineer the MTP from the compiled TFLite graph files, back into a usable Pytorch nn.Module.

I have made a repo on HuggingFace with the extracted files, alongsite replication steps and clues I could find, which I linked here in the post.

TL;DR

Extracted .litertlm --> Multiple .tflite files
Seems to be quantized in INT8 so it might be salvagable with a de-quantization, if Google did QAT training on their side
Reverse-engineerable with Google's AI Edge Model explorer: https://ai.google.dev/edge/model-explorer
Maybe the previous Gemini Nano extraction/conversion efforts are helpful (e.g. converting to safetensors) https://huggingface.co/Xenova/gemini-nano/discussions/1 . This time it should actually be easier to port since we know Gemma 4's transformer block implementations, which seems to be a core part
I extracted a json of the Graphdef, might be usable to reverse engineer this with a LLM. Json is available within my repo in the extracted/ folder.

submitted by /u/Electrical-Monitor27
[link] [comments]