Steering Sparse Autoencoder Latents to Control Dynamic Head Pruning in Vision Transformers (Student Abstract)

arXiv cs.CV / 3/31/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses the challenge that dynamic head pruning in Vision Transformers is typically hard to interpret and control with existing pruning policies.
  • It proposes a framework that trains a Sparse Autoencoder (SAE) on the ViT’s final-layer residual embeddings and then uses amplified sparse latents to drive different pruning decisions.
  • The approach supports “per-class steering,” which discovers compact, class-specific subsets of attention heads while maintaining accuracy.
  • An example reported is a performance gain for the “bowl” class, improving accuracy from 76% to 82% while reducing head usage from 0.72 to 0.33 by pruning down to heads h2 and h5.
  • The authors argue the method links pruning efficiency with mechanistic interpretability by making pruning behavior controllable through sparse, disentangled features.

Abstract

Dynamic head pruning in Vision Transformers (ViTs) improves efficiency by removing redundant attention heads, but existing pruning policies are often difficult to interpret and control. In this work, we propose a novel framework by integrating Sparse Autoencoders (SAEs) with dynamic pruning, leveraging their ability to disentangle dense embeddings into interpretable and controllable sparse latents. Specifically, we train an SAE on the final-layer residual embedding of the ViT and amplify the sparse latents with different strategies to alter pruning decisions. Among them, per-class steering reveals compact, class-specific head subsets that preserve accuracy. For example, bowl improves accuracy (76% to 82%) while reducing head usage (0.72 to 0.33) via heads h2 and h5. These results show that sparse latent features enable class-specific control of dynamic pruning, effectively bridging pruning efficiency and mechanistic interpretability in ViTs.