Operational Feature Fingerprints of Graph Datasets via a White-Box Signal-Subspace Probe

arXiv cs.LG / 4/27/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces WG-SRC, a white-box probe designed to diagnose what mechanisms in graph neural network learning drive node classification, since learned message passing is typically opaque.
  • WG-SRC replaces learned message passing with an explicit graph-signal dictionary (raw features plus low-pass and high-pass propagation terms) and performs classification via Fisher coordinate selection, class-wise PCA subspaces, multi-alpha closed-form ridge decisions, and validation-based score fusion.
  • Across six node-classification datasets, WG-SRC is reported to remain competitive with reproduced graph baselines and provide positive average gains under aligned splits.
  • The method outputs “operational feature fingerprints” that decompose predictions into components such as raw-feature effects, low-pass/high-pass contributions, class-geometric structure, and ridge-boundary behavior.
  • These fingerprints are used to guide dataset- or mechanism-specific follow-up actions (e.g., whether high-pass blocks behave like removable noise, whether raw features should be preserved, and when ridge-type boundary correction matters).

Abstract

Graph neural networks achieve strong node-classification accuracy, but their learned message passing entangles ego attributes, neighborhood smoothing, high-pass graph differences, class geometry, and classifier boundaries in an opaque representation. This obscures why a node is classified and what feature-level graph-learning mechanisms a dataset requires. We propose WG-SRC, a white-box signal-subspace probe for prediction and graph dataset diagnosis. WG-SRC replaces learned message passing with a fixed, named graph-signal dictionary of raw features, row-normalized and symmetric-normalized low-pass propagation, and high-pass graph differences. It combines Fisher coordinate selection, class-wise PCA subspaces, closed-form multi-alpha ridge classification, and validation-based score fusion, so prediction and analysis use explicit class subspaces, energy-controlled dimensions, and closed-form linear decisions. As a white-box graph-learning instrument, WG-SRC uses predictive performance to validate its diagnostics: across six node-classification datasets, the scaffold remains competitive with reproduced graph baselines and achieves positive average gain under aligned splits. Its atlas, produced by a predictor, decomposes behavior into raw-feature, low-pass, high-pass, class-geometric, and ridge-boundary components. These operational feature fingerprints distinguish low-pass-dominated Amazon graphs, mixed high-pass and class-geometrically complex Chameleon behavior, and raw- or boundary-sensitive WebKB graphs. As intrinsic classifier outputs rather than post-hoc explanations, these fingerprints provide post-evaluation guidance for later analysis and dataset-specific modification. Aligned mechanistic interventions support this guidance by indicating when high-pass blocks act as removable noise, when raw features should be preserved, and when ridge-type boundary correction matters.

Operational Feature Fingerprints of Graph Datasets via a White-Box Signal-Subspace Probe | AI Navigate