Dissociating Decodability and Causal Use in Bracket-Sequence Transformers
arXiv cs.LG / 4/27/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper studies how bracket-sequence (Dyck language) transformers represent hierarchical structure, comparing whether observed signals are merely decodable or actually used causally.
- Through probing and intervention on the residual stream and attention patterns, the authors find that depth, distance, and top-of-stack signals are decodable but do not all play the same causal role.
- Masking attention specifically at the true top-of-stack position sharply reduces long-distance accuracy, indicating that certain attention behaviors are causally important.
- In contrast, ablating low-dimensional residual-stream subspaces produces comparatively little impact, suggesting that not all decodable internal representations are causally necessary.
- The findings also hold in a templated natural-language setting, reinforcing the general claim that decodability alone does not guarantee causal use of internal variables.
Related Articles

Subagents: The Building Block of Agentic AI
Dev.to

DeepSeek-V4 Models Could Change Global AI Race
AI Business

Got OpenAI's privacy filter model running on-device via ExecuTorch
Reddit r/LocalLLaMA

The Agent-Skill Illusion: Why Prompt-Based Control Fails in Multi-Agent Business Consulting Systems
Dev.to

We Built a Voice AI Receptionist in 8 Weeks — Every Decision We Made and Why
Dev.to