Linear Representations of Hierarchical Concepts in Language Models

arXiv cs.CL / 4/10/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies whether language models internally encode hierarchical relations between concepts (e.g., Japan ⊂ Eastern Asia ⊂ Asia) and how this encoding manifests in their representations.
  • It extends “Linear Relational Concepts” by training linear transformations for each hierarchical depth and semantic domain, then comparing transformations to characterize differences tied to hierarchy.
  • Experiments show that, within a given domain, hierarchical relations can be linearly recovered from model representations, including for multi-token entities and across layers.
  • The analysis finds hierarchy information resides in a relatively low-dimensional subspace, which is often domain-specific, yet the learned hierarchy representation is highly similar across those domain-specific subspaces.
  • Overall, the authors argue that concept hierarchies in tested models are captured via highly interpretable linear representations, with results supported by in-domain generalization and cross-domain transfer evaluations.

Abstract

We investigate how and to what extent hierarchical relations (e.g., Japan \subset Eastern Asia \subset Asia) are encoded in the internal representations of language models. Building on Linear Relational Concepts, we train linear transformations specific to each hierarchical depth and semantic domain, and characterize representational differences associated with hierarchical relations by comparing these transformations. Going beyond prior work on the representational geometry of hierarchies in LMs, our analysis covers multi-token entities and cross-layer representations. Across multiple domains we learn such transformations and evaluate in-domain generalization to unseen data and cross-domain transfer. Experiments show that, within a domain, hierarchical relations can be linearly recovered from model representations. We then analyze how hierarchical information is encoded in representation space. We find that it is encoded in a relatively low-dimensional subspace and that this subspace tends to be domain-specific. Our main result is that hierarchy representation is highly similar across these domain-specific subspaces. Overall, we find that all models considered in our experiments encode concept hierarchies in the form of highly interpretable linear representations.