AI Navigate

A Systematic Evaluation Protocol of Graph-Derived Signals for Tabular Machine Learning

arXiv cs.AI / 3/17/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that current studies on graph-derived signals in tabular learning rely on limited experimental setups and lack reliability analyses, and introduces a taxonomy-driven empirical analysis approach.
  • It presents a unified, reproducible evaluation protocol to assess which categories of graph-derived signals yield statistically significant and robust improvements, with an extensible setup for integrating signals into tabular learning pipelines and features like automated hyperparameter optimization, multi-seed evaluation, formal significance testing, and robustness under graph perturbations.
  • The protocol is demonstrated through a large-scale, imbalanced cryptocurrency fraud detection case study, identifying signal categories that provide consistently reliable gains and offering interpretable insights into fraud-discriminative structural patterns.
  • Robustness analyses show pronounced differences in how various signals handle missing or corrupted relational data, underscoring practical utility for fraud detection and applicability to other domains.

Abstract

While graph-derived signals are widely used in tabular learning, existing studies typically rely on limited experimental setups and average performance comparisons, leaving the statistical reliability and robustness of observed gains largely unexplored. Consequently, it remains unclear which signals provide consistent and robust improvements. This paper presents a taxonomy-driven empirical analysis of graph-derived signals for tabular machine learning. We propose a unified and reproducible evaluation protocol to systematically assess which categories of graph-derived signals yield statistically significant and robust performance improvements. The protocol provides an extensible setup for the controlled integration of diverse graph-derived signals into tabular learning pipelines. To ensure a fair and rigorous comparison, it incorporates automated hyperparameter optimization, multi-seed statistical evaluation, formal significance testing, and robustness analysis under graph perturbations. We demonstrate the protocol through an extensive case study on a large-scale, imbalanced cryptocurrency fraud detection dataset. The analysis identifies signal categories providing consistently reliable performance gains and offers interpretable insights into which graph-derived signals indicate fraud-discriminative structural patterns. Furthermore, robustness analyses reveal pronounced differences in how various signals handle missing or corrupted relational data. These findings demonstrate practical utility for fraud detection and illustrate how the proposed taxonomy-driven evaluation protocol can be applied in other application domains.