Causal Attribution via Activation Patching
arXiv cs.CV / 3/17/2026
📰 NewsModels & Research
Key Points
- CAAP introduces causal attribution by patch-level activation patching, intervening on internal activations rather than using learned masks or synthetic perturbations to estimate patch influence.
- For each patch, the method inserts the corresponding source-image activations into a neutral target context across intermediate layers and uses the resulting target-class score as the attribution signal.
- The approach aims to capture the causal contribution of patch-specific internal representations, avoiding late-layer global mixing that can reduce spatial localization.
- Empirical results show CAAP outperforms existing attribution methods across multiple ViT backbones and standard metrics, producing more faithful and localized attribution maps.
Related Articles
[D] Matryoshka Representation Learning
Reddit r/MachineLearning
Two new Qwen3.5 “Neo” fine‑tunes focused on fast, efficient reasoning
Reddit r/LocalLLaMA

HKIC, Gobi Partners and HKU team up for fund backing university research start-ups
SCMP Tech
Yann LeCun’s New LeWorldModel (LeWM) Research Targets JEPA Collapse in Pixel-Based Predictive World Modeling
MarkTechPost
Streaming experts
Simon Willison's Blog