Do Sparse Autoencoders Capture Concept Manifolds?
arXiv cs.LG / 5/1/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper questions a common SAE assumption that concepts align with independent linear directions and instead argues that many concepts lie on low-dimensional geometric manifolds.
- It proposes a theoretical framework for when and how sparse autoencoders can capture manifolds, distinguishing two mechanisms: a global scheme using a compact set of atoms spanning the whole manifold, and a local scheme using features that tile restricted regions of the geometry.
- The authors provide empirical evidence that SAEs often recover continuous manifold structure poorly, blending global and local solutions in what they term a “dilution” regime.
- The dilution behavior helps explain why manifold structure is rarely apparent when inspecting individual learned concepts, motivating post-hoc unsupervised methods to discover coherent groups of atoms rather than isolated directions.
- Overall, the findings suggest interpretability in future representation learning should focus on geometric objects (manifold-like units) instead of single feature directions.
Related Articles

Why Autonomous Coding Agents Keep Failing — And What Actually Works
Dev.to

Why Enterprise AI Pilots Fail
Dev.to

The PDF Feature Nobody Asked For (That I Use Every Day)
Dev.to

How to Fix OpenClaw Tool Calling Issues
Dev.to

Mistral's new flagship Medium 3.5 folds chat, reasoning, and code into one model
THE DECODER