Neighbourhood Transformer: Switchable Attention for Monophily-Aware Graph Learning

arXiv cs.LG / 4/13/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes Neighbourhood Transformers (NT), a new graph learning paradigm that replaces message-passing to a central node with self-attention applied within each local neighbourhood to handle monophily more directly.
  • NT is argued to be inherently monophily-aware and to have theoretical expressiveness that is no weaker than traditional message-passing GNN frameworks.
  • To make the approach practical for large graphs, the authors introduce neighbourhood partitioning with switchable attentions, reporting space reductions of over 95% and time reductions up to 92.67%.
  • Experiments on 10 real-world datasets (including both heterophilic and homophilic graphs) show NT outperforming existing state-of-the-art methods on node classification and maintaining strong cross-domain adaptability.
  • The authors release full implementation code publicly (MoNT repository) to support reproducibility and potential industrial adoption.

Abstract

Graph neural networks (GNNs) have been widely adopted in engineering applications such as social network analysis, chemical research and computer vision. However, their efficacy is severely compromised by the inherent homophily assumption, which fails to hold for heterophilic graphs where dissimilar nodes are frequently connected. To address this fundamental limitation in graph learning, we first draw inspiration from the recently discovered monophily property of real-world graphs, and propose Neighbourhood Transformers (NT), a novel paradigm that applies self-attention within every local neighbourhood instead of aggregating messages to the central node as in conventional message-passing GNNs. This design makes NT inherently monophily-aware and theoretically guarantees its expressiveness is no weaker than traditional message-passing frameworks. For practical engineering deployment, we further develop a neighbourhood partitioning strategy equipped with switchable attentions, which reduces the space consumption of NT by over 95% and time consumption by up to 92.67%, significantly expanding its applicability to larger graphs. Extensive experiments on 10 real-world datasets (5 heterophilic and 5 homophilic graphs) show that NT outperforms all current state-of-the-art methods on node classification tasks, demonstrating its superior performance and cross-domain adaptability. The full implementation code of this work is publicly available at https://github.com/cf020031308/MoNT to facilitate reproducibility and industrial adoption.