Feature Attribution Stability Suite: How Stable Are Post-Hoc Attributions?
arXiv cs.AI / 4/6/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces the Feature Attribution Stability Suite (FASS) to benchmark how stable post-hoc feature attribution methods are under realistic input perturbations while controlling for prediction changes.
- FASS improves evaluation by adding prediction-invariance filtering and splitting stability into structural similarity, rank correlation, and top-k Jaccard overlap, rather than relying on a single scalar metric.
- Experiments across Integrated Gradients, GradientSHAP, Grad-CAM, and LIME show that stability varies strongly by perturbation family, with geometric perturbations producing much larger attribution instability than photometric ones.
- Without conditioning on prediction preservation, the study finds that up to 99% of evaluated attribution pairs involve changed predictions, indicating that many prior stability results may conflate explanation fragility with model sensitivity.
- Under the controlled evaluation, Grad-CAM shows the most consistently stable attribution patterns across ImageNet-1K, MS COCO, and CIFAR-10, across four architectures.




