Sparse but not Simpler: A Multi-Level Interpretability Analysis of Vision Transformers

arXiv cs.CV / 3/18/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

They introduce IMPACT, a multi-level interpretability framework for Vision Transformers that evaluates neurons, layer representations, task circuits, and model-level attribution.
The study applies this framework to DeiT-III B/16 models pruned with Wanda, using techniques like BatchTopK sparse autoencoders and learnable node masking to analyze representations and circuits.
Results show structural sparsity reduces edges by approximately 2.5× but leaves active nodes similar or higher, indicating pruning redistributes computation rather than creating simpler modules.
There are no systematic gains in neuron-level selectivity, SAE feature interpretability, or attribution faithfulness, suggesting sparsity alone does not reliably improve interpretability in vision models.
The work argues for evaluation frameworks that assess interpretability beyond circuit compactness.

Abstract

Sparse neural networks are often hypothesized to be more interpretable than dense models, motivated by findings that weight sparsity can produce compact circuits in language models. However, it remains unclear whether structural sparsity itself leads to improved semantic interpretability. In this work, we systematically evaluate the relationship between weight sparsity and interpretability in Vision Transformers using DeiT-III B/16 models pruned with Wanda. To assess interpretability comprehensively, we introduce \textbf{IMPACT}, a multi-level framework that evaluates interpretability across four complementary levels: neurons, layer representations, task circuits, and model-level attribution. Layer representations are analyzed using BatchTopK sparse autoencoders, circuits are extracted via learnable node masking, and explanations are evaluated with transformer attribution using insertion and deletion metrics. Our results reveal a clear structural effect but limited interpretability gains. Sparse models produce circuits with approximately

2.5\times

fewer edges than dense models, yet the fraction of active nodes remains similar or higher, indicating that pruning redistributes computation rather than isolating simpler functional modules. Consistent with this observation, sparse models show no systematic improvements in neuron-level selectivity, SAE feature interpretability, or attribution faithfulness. These findings suggest that structural sparsity alone does not reliably yield more interpretable vision models, highlighting the importance of evaluation frameworks that assess interpretability beyond circuit compactness.

Astral to Join OpenAI

Dev.to

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.

Reddit r/LocalLLaMA

Why Data is Important for LLM

Dev.to

The Inference Market Is Consolidating. Agent Payments Are Still Nobody's Problem.

Dev.to

YouTube's Deepfake Shield for Politicians Changes Evidence Forever

Dev.to

Sparse but not Simpler: A Multi-Level Interpretability Analysis of Vision Transformers

Key Points

Abstract

Related Articles

Astral to Join OpenAI

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.

Why Data is Important for LLM

The Inference Market Is Consolidating. Agent Payments Are Still Nobody's Problem.

YouTube's Deepfake Shield for Politicians Changes Evidence Forever

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer