GRAFT: Auditing Graph Neural Networks via Global Feature Attribution

arXiv cs.LG / 5/6/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • GRAFT is a post-hoc framework designed to globally explain Graph Neural Networks by identifying which input node features influence predictions at the class level.
  • Unlike existing global GNN explainers that focus on structural motifs (subgraphs), GRAFT targets feature-level importance profiles derived from input attributes.
  • The method combines diversity-guided exemplar selection, Integrated Gradients-style attribution, and aggregation to produce a global view of feature influence per class.
  • GRAFT can generate concise natural-language rules for feature behavior by using a large language model with self-refinement, and the paper includes a structured human evaluation protocol to judge rule interpretability.
  • Experiments across multiple datasets and architectures show that GRAFT effectively captures model-relevant features, supports bias analysis, and can improve feature-efficient transfer learning.

Abstract

Graph Neural Networks (GNNs) achieve strong performance on node classification tasks but remain difficult to interpret, particularly with respect to which input features drive their predictions. Existing global GNN explainers operate at the structural level identifying recurring subgraph motifs, but none explain model behaviour globally at the level of input node attributes. We propose GRAFT, a posthoc global explanation framework that identifies class-level feature importance profiles for GNNs. The method combines diversity-guided exemplar selection, Integrated Gradients-based attribution, and aggregation to construct a global view of feature influence for each class, which can be further expressed as concise natural language rules using a large language model with self-refinement. We evaluate GRAFT across multiple datasets, architectures, and experimental settings, demonstrating its effectiveness in capturing model-relevant features, supporting bias analysis, and enabling feature-efficient transfer learning. In addition, we introduce a structured human evaluation protocol to assess the interpretability of generated rules along dimensions such as accuracy and usefulness. Our results suggest that GRAFT provides a practical and interpretable approach for analysing feature-level behaviour in GNNs, bridging quantitative attribution with human-understandable explanations.