k-Maximum Inner Product Attention for Graph Transformers and the Expressive Power of GraphGPS The Expressive Power of GraphGPS
arXiv cs.LG / 4/7/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper proposes k-Maximum Inner Product (k-MIP) attention for graph transformers, using a top-k selection of key nodes per query to avoid the quadratic cost of all-to-all attention on large graphs.
- By combining top-k sparsification with an attention score computation using symbolic matrices, k-MIP attention achieves linear memory complexity and reports up to ~10× speedups over all-to-all attention.
- The method enables processing graphs with over 500k nodes on a single NVIDIA A100 GPU while maintaining strong empirical performance on multiple benchmarks.
- The authors provide theoretical guarantees that k-MIP transformers can approximate any full-attention transformer to arbitrary precision, i.e., they do not reduce expressive power in the studied sense.
- The paper also analyzes the expressive capacity of the GraphGPS framework when equipped with this attention and links performance to graph distinguishing power via the S-SEG-WL test, then validates results on several datasets.
Related Articles

Black Hat Asia
AI Business
v0.20.5
Ollama Releases

Inside Anthropic's Project Glasswing: The AI Model That Found Zero-Days in Every Major OS
Dev.to
Gemma 4 26B fabricated an entire code audit. I have the forensic evidence from the database.
Reddit r/LocalLLaMA

SoloEngine: Low-Code Agentic AI Development Platform with Native Support for Multi-Agent Collaboration, MCP, and Skill System
Dev.to