Graph Reconstruction from Differentially Private GNN Explanations

arXiv cs.LG / 5/6/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that differential privacy (DP) alone does not protect graph neural network (GNN) post-hoc explanations, since an adversary can reconstruct hidden graph structure from DP-perturbed explanations with high accuracy.
  • It introduces an attack called PRIVX that leverages the known Gaussian DP noise mechanism to frame reconstruction as reverse diffusion (a Bayesian denoiser conditioned on the corrupted signal).
  • The authors formalize a stratified attacker model (ranging from oblivious to oracle) and provide two-sided, endpoint-matched bounds on reconstruction AUC.
  • Practical guidance is given: neighbourhood-aggregating explainers (e.g., GraphLIME, GNNExplainer) can leak more under the same DP budget on homophilic graphs, while the leakage ordering can reverse on strongly heterophilic graphs.
  • An auxiliary diagnostic, PRIVF, is proposed to help separate leakage attributable to the explainer design versus intrinsic properties of the underlying graph distribution, and experiments validate the attack across multiple benchmarks, DP mechanisms, and GNN backbones.

Abstract

Regulatory frameworks such as GDPR increasingly require that ML predictions be accompanied by post-hoc explanations, even when raw data and trained models cannot be released. Differential privacy (DP) is the standard mitigation for the residual privacy risk of releasing these explanations. We show that DP is not sufficient: an adversary observing only DP-perturbed GNN explanations can reconstruct hidden graph structure with high accuracy. Our attack, PRIVX, exploits the fact that the Gaussian DP mechanism is a single DDPM forward step at known noise level {\sigma}({\epsilon}), recasting reconstruction as reverse diffusion conditioned on the corrupted signal, a principled Bayesian denoiser under known DP corruption. We formalise a stratified adversary model parameterised by (M, \hat{\epsilon}, \hat{\delta}, S, \rho) that interpolates between oblivious and oracle attackers, and derive endpoint-matched two-sided bounds on reconstruction AUC. For practitioners, we provide regime-stratified guidance on explainer choice: on homophilic graphs, neighbourhood-aggregating explainers (GraphLIME, GNNExplainer) leak more structure than per-node gradient explainers under the same DP budget; on strongly heterophilic graphs the ordering reverses. We introduce PRIVF as an auxiliary diagnostic sharing the same diffusion backbone to decompose leakage into explainer-induced and intrinsic graph-distribution components. Experiments across seven benchmarks, three DP mechanisms, and three GNN backbones show PRIVX achieves AUC above 0.7 at {\epsilon} = 5 on five of seven datasets, with the attack succeeding well within typically deployed privacy budgets.