Causal Attribution via Activation Patching
arXiv cs.CV / 3/17/2026
📰 NewsModels & Research
Key Points
- CAAP introduces causal attribution by patch-level activation patching, intervening on internal activations rather than using learned masks or synthetic perturbations to estimate patch influence.
- For each patch, the method inserts the corresponding source-image activations into a neutral target context across intermediate layers and uses the resulting target-class score as the attribution signal.
- The approach aims to capture the causal contribution of patch-specific internal representations, avoiding late-layer global mixing that can reduce spatial localization.
- Empirical results show CAAP outperforms existing attribution methods across multiple ViT backbones and standard metrics, producing more faithful and localized attribution maps.
Related Articles
When AI Grows Up: Identity, Memory, and What Persists Across Versions
Dev.to
OpenAI is throwing everything into building a fully automated researcher
MIT Technology Review
Kimi just published a paper replacing residual connections in transformers. results look legit
Reddit r/LocalLLaMA
機械学習の最適化対象まとめ(E資格対策にも)
Qiita

14 Best Self-Hosted Claude Alternatives for AI and Coding in 2026
Dev.to