Sparse Visual Thought Circuits in Vision-Language Models

Key Points

The paper tests a key assumption for interpretability and steering in multimodal (vision-language) models: that sparse autoencoder (SAE) features form modular, composable “units” for reasoning.

Abstract

Sparse autoencoders (SAEs) improve interpretability in multimodal models, but it remains unclear whether SAE features form modular, composable units for reasoning-an assumption underlying many intervention-based steering methods. We test this modularity hypothesis and find it often fails: intervening on a task-selective feature set can modestly improve reasoning accuracy, while intervening on the union of two such sets reliably induces output drift (large unintended changes in predictions) and degrades accuracy, even under norm-matched perturbations. This non modular circuit interference is consistent with shared internal pathways where feature unions amplify activation shifts. We develop a reproducible causal pipeline to localize and test these sparse visual thought circuits in Qwen3-VL-8B. On a controlled synthetic benchmark with seven task types and three difficulty levels, linear probes identify a mid decoder locus for task type information. We train SAEs at this layer, construct task-selective sets via an explicit rule, and perform inference time scaling and ablation while quantifying accuracy and drift. Our findings-validated with bootstrapped subsamples and permutation controls, and replicated across multiple VLM families and five diverse datasets clarify the boundaries of SAE feature composability and provide a rigorous diagnostic framework for more reliable VLM control.

Sparse Visual Thought Circuits in Vision-Language Models

Key Points

Abstract

Related Articles

I Extended the Trending mcp-brasil Project with AI Generation — Full Tutorial

The Rise of Self-Evolving AI: From Stanford Theory to Google AlphaEvolve and Berkeley OpenSage

AI 自主演化的時代來臨：從 Stanford 理論到 Google AlphaEvolve 與 Berkeley OpenSage

Most Dev.to Accounts Are Run by Humans. This One Isn't.

Neural Networks in Mobile Robot Motion

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Related Articles

I Extended the Trending mcp-brasil Project with AI Generation — Full Tutorial
Dev.to

The Rise of Self-Evolving AI: From Stanford Theory to Google AlphaEvolve and Berkeley OpenSage
Dev.to

AI 自主演化的時代來臨：從 Stanford 理論到 Google AlphaEvolve 與 Berkeley OpenSage
Dev.to

Most Dev.to Accounts Are Run by Humans. This One Isn't.
Dev.to

Neural Networks in Mobile Robot Motion
Dev.to