A framework for analyzing concept representations in neural models

arXiv cs.CL / 5/5/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes a unified framework to analyze how neural models represent human-interpretable concepts by evaluating both containment (a concept is represented within a subspace but not outside) and disentanglement (isolation from other concepts).
  • Experiments on text and speech models show that concept subspaces are not necessarily uniquely determined, which complicates how concept subspaces should be interpreted.
  • The authors compare five different estimators from different research communities and find that the chosen estimator significantly affects measured containment and disentanglement properties.
  • While the concept erasure method LEACE performs well on both axes, it still has difficulty generalizing to unseen data.
  • In HuBERT speech representations, phone information is both contained and disentangled relative to speaker information, whereas speaker information is difficult to capture in a compact subspace even when it is disentangled from phones.

Abstract

Understanding how neural models represent human-interpretable concepts is challenging. Prior work has explored linear concept subspaces from diverse perspectives, such as probing and concept erasure. We introduce a unified framework to study these subspaces along two axes: \textit{containment}, which tests if a concept is fully represented in a subspace but not outside it, and \textit{disentanglement}, which tests for isolation from other concepts. In experiments on both text and speech models, we first highlight that concept subspaces may not be uniquely determined, and discuss the implications for concept subspace analysis. Then, we compare properties of concept subspaces estimated using five estimators, proposed in different communities. We find that (1) the choice of estimator impacts the containment and disentanglement properties; (2) the state-of-the-art concept erasure method, LEACE, performs well on both testing axes, but still struggles to generalize to unseen data; and (3) in HuBERT speech representations, phone information is both contained and disentangled from speaker information, while speaker information is hard to contain in a compact subspace, despite being disentangled from phones.