Counting Circuits: Mechanistic Interpretability of Visual Reasoning in Large Vision-Language Models
arXiv cs.CV / 3/20/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- LVLMs display human-like counting behavior, achieving precise performance on small numerosities and noisy estimation on larger quantities, as shown on controlled synthetic and real-world benchmarks.
- The authors introduce two interpretability methods, Visual Activation Patching and HeadLens, to uncover a structured counting circuit shared across a range of visual reasoning tasks.
- They demonstrate a lightweight intervention that fine-tunes pretrained LVLMs on counting using synthetic images, yielding improved counting in-distribution and an average +8.36% boost on out-of-distribution counting benchmarks and +1.54% on complex general visual reasoning for Qwen2.5-VL.
- The results suggest counting is central to visual reasoning and point to a practical pathway for boosting overall capabilities by targeting counting mechanisms.
Related Articles

Attacks On Data Centers, Qwen3.5 In All Sizes, DeepSeek’s Huawei Play, Apple’s Multimodal Tokenizer
The Batch

Your AI generated code is "almost right", and that is actually WORSE than it being "wrong".
Dev.to

Lessons from Academic Plagiarism Tools for SaaS Product Development
Dev.to

**Core Allocation Optimization for Energy‑Efficient Multi‑Core Scheduling in ARINC650 Systems**
Dev.to

KI in der amtlichen Recherche beim DPMA: Was Patentanwälte bei Neuanmeldungen jetzt beachten sollten (Stand: März 2026)
Dev.to