Inductive Subgraphs as Shortcuts: Causal Disentanglement for Heterophilic Graph Learning

arXiv cs.LG / 4/22/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies why heterophily (edges connecting dissimilar nodes) degrades traditional homophily-based GNN performance, which earlier work did not fully explain.
  • It argues that recurring inductive subgraphs function as spurious “shortcuts” that cause GNNs to rely on non-causal correlations and lead to misclassifications.
  • Using a causal inference framework, the authors propose a debiased causal graph that blocks confounding and spillover paths responsible for these shortcut behaviors.
  • Based on this causal graph, they introduce Causal Disentangled GNN (CD-GNN), which disentangles spurious inductive subgraphs from true causal subgraphs by explicitly blocking non-causal paths.
  • Experiments on real-world heterophilic graph datasets show CD-GNN improves robustness and node classification accuracy and outperforms existing heterophily-aware methods.

Abstract

Heterophily is a prevalent property of real-world graphs and is well known to impair the performance of homophilic Graph Neural Networks (GNNs). Prior work has attempted to adapt GNNs to heterophilic graphs through non-local neighbor extension or architecture refinement. However, the fundamental reasons behind misclassifications remain poorly understood. In this work, we take a novel perspective by examining recurring inductive subgraphs, empirically and theoretically showing that they act as spurious shortcuts that mislead GNNs and reinforce non-causal correlations in heterophilic graphs. To address this, we adopt a causal inference perspective to analyze and correct the biased learning behavior induced by shortcut inductive subgraphs. We propose a debiased causal graph that explicitly blocks confounding and spillover paths responsible for these shortcuts. Guided by this causal graph, we introduce Causal Disentangled GNN (CD-GNN), a principled framework that disentangles spurious inductive subgraphs from true causal subgraphs by explicitly blocking non-causal paths. By focusing on genuine causal signals, CD-GNN substantially improves the robustness and accuracy of node classification in heterophilic graphs. Extensive experiments on real-world datasets not only validate our theoretical findings but also demonstrate that our proposed CD-GNN outperforms state-of-the-art heterophily-aware baselines.