| Hey Everyone In a previous post I had mentioned I found out Gemma 4 has MTP. Turns out I was able to extract the model weights, but now I need help from the community, especially people who know C++ to help reverse engineer the MTP from the compiled TFLite graph files, back into a usable Pytorch nn.Module. I have made a repo on HuggingFace with the extracted files, alongsite replication steps and clues I could find, which I linked here in the post. TL;DR
[link] [comments] |
Update on Gemma 4 having MTP: Reverse engineering effort
Reddit r/LocalLLaMA / 4/10/2026
💬 OpinionSignals & Early TrendsTools & Practical UsageModels & Research
Key Points
- A community update claims Gemma 4 includes MTP (multi-token prediction) and that the author has extracted the model weights from the shipped TFLite artifacts.
- The next milestone is reverse-engineering the MTP logic from compiled TFLite graph files back into a working PyTorch `nn.Module`, with a request for C++ expertise to interpret the graph implementation.
- The extracted model appears to be INT8-quantized, and the author suggests it may be recoverable via de-quantization if Google used QAT (quantization-aware training).
- The effort points to using Google’s AI Edge Model Explorer to help inspect/understand the TFLite/graph structure and references prior Gemini Nano extraction/conversion work as a potential guide.
- The author has published a Hugging Face repo with extraction outputs, replication steps, and a GraphDef JSON that might be used alongside an LLM to aid reverse engineering.



