Distance-Misaligned Training in Graph Transformers and Adaptive Graph-Aware Control

arXiv cs.AI / 4/27/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper shows that Graph Transformers’ ability to mix information across long ranges can cause failure modes when tasks need different locality patterns (local vs long-range communication).
  • Using a synthetic node-classification benchmark on contextual stochastic block model graphs, the authors define “distance-misaligned training” as a mismatch between where label-relevant signals exist over graph distance and where the model actually allocates communication.
  • The study finds that the optimal graph-distance bias varies systematically with the task’s locality characteristics.
  • An “oracle” adaptive controller that uses offline access to the task-side distance target can nearly match the best fixed bias and significantly outperform a neutral baseline, especially on mixed and local tasks.
  • A task-agnostic controller performs worse, suggesting that adaptation alone is insufficient and that the specific control target (distance-resolved) is critical.

Abstract

Graph Transformers can mix information globally, but this flexibility also creates failure modes: some tasks require long-range communication while others are better served by local interaction. We study this through a synthetic node-classification benchmark on contextual stochastic block model graphs, where labels are generated by a controllable mixture of local and far-shell signals. We define distance-misaligned training as a mismatch between where label-relevant information lies and where the model allocates communication over graph distance. On this benchmark, we find three points. First, the preferred graph-distance bias changes systematically with task locality. Second, an oracle adaptive controller, given offline access to the task-side distance target, nearly matches the best fixed bias across regimes and strongly improves over a neutral baseline on mixed and local tasks. Third, a task-agnostic zero-gap controller is weaker, indicating that adaptation alone is not enough and that the control target matters. These results suggest that distance-resolved diagnosis is useful for understanding Graph Transformer failures and for designing graph-aware control.