Graph Memory Transformer (GMT)
arXiv cs.LG / 4/28/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The Graph Memory Transformer (GMT) explores replacing the FFN sublayer in a decoder-only transformer with an explicit learned memory graph while keeping causal self-attention and the autoregressive decoder structure.
- GMT routes token representations through a learned bank of centroids using a directed transition matrix, producing a “movement” from a source memory state to a target memory state rather than retrieving a value.
- In the studied base GMT v7 configuration, each of 16 transformer blocks uses 128 centroids and associated edge/transition structures, with a gated displacement readout that enables direct inspection of centroid usage and transition behavior.
- The base GMT v7 is an 82.2M-parameter decoder-only language model without dense FFN sublayers, but it underperforms the 103.0M dense GPT-style baseline on validation loss and perplexity.
- The authors emphasize that results are not a state-of-the-art claim and position GMT as evidence that graph-mediated memory navigation can make within-token transformations more structurally interpretable, with scaling and broader evaluation left for future work.
Related Articles

Write a 1,200-word blog post: "What is Generative Engine Optimization (GEO) and why SEO teams need it now"
Dev.to

Indian Developers: How to Build AI Side Income with $0 Capital in 2026
Dev.to

Most People Use AI Like Google. That's Why It Sucks.
Dev.to

Behind the Scenes of a Self-Evolving AI: The Architecture of Tian AI
Dev.to

Tian AI vs ChatGPT: Why Local AI Is the Future of Privacy
Dev.to