Shapley Value-Guided Adaptive Ensemble Learning for Explainable Financial Fraud Detection with U.S. Regulatory Compliance Validation

arXiv cs.LG / 4/17/2026

📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses a key barrier to deploying AI fraud-detection in the U.S.: black-box explanations that fail to meet auditability requirements under regulations such as OCC Bulletin 2011-12 and Federal Reserve SR 11-7.
  • It evaluates explanation quality using faithfulness (sufficiency and comprehensiveness at k=5/10/15) and stability (Kendall’s W over 30 bootstrap samples), finding that XGBoost with TreeExplainer provides near-perfect stability (W=0.9912) while LSTM with DeepExplainer is much weaker (W=0.4962).
  • It proposes the SHAP-Guided Adaptive Ensemble (SGAE), which adaptively sets per-transaction ensemble weights based on SHAP attribution agreement, achieving the best predictive performance with AUC-ROC of 0.8837 on held-out data and 0.9245 under cross-validation.
  • Using the full 590,540-transaction IEEE-CIS dataset, the study compares LSTM, Transformer, and GNN-GraphSAGE and reports GNN-GraphSAGE as strongest among architectures (AUC-ROC 0.9248, F1=0.6013).
  • The authors directly map their explanation and validation results to U.S. regulatory compliance needs across OCC, SR 11-7, and BSA-AML frameworks.

Abstract

Financial crime costs U.S. institutions over $32 billion each year. Although AI tools for fraud detection have become more advanced, their use in real-world systems still faces a major obstacle: many of these models operate as black boxes that cannot provide the transparent, auditable explanations required by regulations such as OCC Bulletin 2011-12 and Federal Reserve SR 11-7. This study makes three main contributions. First, it offers a thorough evaluation of explanation quality across faithfulness (sufficiency and comprehensiveness at k=5, 10, and 15) and stability (Kendall's W across 30 bootstrap samples). XGBoost paired with TreeExplainer achieves near-perfect stability (W=0.9912), while LSTM with DeepExplainer shows weak results (W=0.4962). Second, the paper introduces the SHAP-Guided Adaptive Ensemble (SGAE), which dynamically adjusts per-transaction ensemble weights based on SHAP attribution agreement, achieving the highest AUC-ROC among all tested models (0.8837 held-out; 0.9245 cross-validation). Third, a complete three-architecture evaluation of LSTM, Transformer, and GNN-GraphSAGE on the full 590,540-transaction IEEE-CIS dataset is provided, with GNN-GraphSAGE achieving AUC-ROC 0.9248 and F1=0.6013. All results are mapped directly to OCC, SR 11-7, and BSA-AML regulatory compliance requirements.