Rethinking XAI Evaluation: A Human-Centered Audit of Shapley Benchmarks in High-Stakes Settings
arXiv cs.AI / 4/27/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that Shapley-based XAI is undermined by fragmented variants and by evaluation methods that rely on quantitative proxies whose connection to human usefulness is not validated.
- Using an amortized framework, the authors compare eight Shapley variants to measure semantic differences under low-latency constraints typical of operational risk workflows.
- They run large-scale experiments across four risk datasets and a realistic fraud-detection setting with professional analysts reviewing 3,735 cases.
- The results show quantitative metrics like sparsity and faithfulness do not reliably reflect human-perceived clarity or decision utility.
- Although no Shapley formulation improved objective analyst performance, the explanations increased decision confidence, raising a serious automation-bias risk in high-stakes environments.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles

Subagents: The Building Block of Agentic AI
Dev.to

DeepSeek-V4 Models Could Change Global AI Race
AI Business

Got OpenAI's privacy filter model running on-device via ExecuTorch
Reddit r/LocalLLaMA

The Agent-Skill Illusion: Why Prompt-Based Control Fails in Multi-Agent Business Consulting Systems
Dev.to

We Built a Voice AI Receptionist in 8 Weeks — Every Decision We Made and Why
Dev.to