Sparse Auto-Encoders and Holism about Large Language Models
arXiv cs.AI / 3/30/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper asks whether large language models imply a “meta-semantic” view of how words and complex expressions acquire meaning.
- It reviews prior arguments that LLMs adopt a holistic conception of meaning via distributional semantics, but notes that mechanistic interpretability may challenge that view.
- It introduces recent findings from sparse auto-encoders showing a large number of interpretable latent features in LLM embedding spaces, motivating a more decompositional interpretation of meaning.
- The author then analyzes the nature of these features and concludes that holism can still hold if the relevant features are countable.




