Non-identifiability of Explanations from Model Behavior in Deep Networks of Image Authenticity Judgments
arXiv cs.CV / 4/9/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper examines whether deep neural networks that predict human image authenticity judgments also produce explanations that are robust and identifiable, rather than merely correlating with behavior.
- Experiments across multiple frozen vision models show predictive accuracy can reach ~80% of the noise ceiling, but explanation quality varies: some models (e.g., VGG) appear to track general image quality rather than authenticity-specific factors.
- Attribution methods tested (Grad-CAM, LIME, and multiscale pixel masking) yield attribution maps that are stable within an architecture (especially for EfficientNetB3 and Barlow Twins) and are more consistent for images judged more authentic.
- However, attribution agreement across different architectures is weak even when predictive performance is similar, indicating the explanations are not reliably identifiable.
- The authors use ensembling to improve authenticity prediction and obtain more image-level attribution, yet conclude that successful behavioral prediction does not imply explanations reflect underlying cognitive mechanisms.
Related Articles

Black Hat Asia
AI Business

Meta Superintelligence Lab Releases Muse Spark: A Multimodal Reasoning Model With Thought Compression and Parallel Agents
MarkTechPost

Chatbots are great at manipulating people to buy stuff, Princeton boffins find
The Register
I tested and ranked every ai companion app I tried and here's my honest breakdown
Reddit r/artificial

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to