Semantic Structure of Feature Space in Large Language Models

arXiv cs.CL / 5/1/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper reports that the geometric relationships among semantic features in large language model hidden states closely align with human psychological associations.
It constructs feature vectors for 360 words and projects them onto 32 semantic axes (e.g., beautiful–ugly, soft–hard), finding strong correlations with human ratings on the corresponding semantic scales.
The authors show that cosine similarities between semantic axes predict how strongly the corresponding scales correlate in human surveys.
They further find that variance across the 32 semantic axes concentrates in a low-dimensional subspace, and that manipulating a word along one axis produces predictable spillover changes along other axes based on cosine similarity.
Overall, the results argue that LLM features should be analyzed not only in isolation, but also via their geometry, inter-axis relations, and the low-dimensional subspaces they form.

Abstract

We show that the geometric relations between semantic features in large language models' hidden states closely mirror human psychological associations. We construct feature vectors corresponding to 360 words and project them on 32 semantic axes (e.g. beautiful-ugly, soft-hard), and find that these projections correlate highly with human ratings of those words on the respective semantic scales. Second, we find that the cosine similarities between the semantic axes themselves are highly predictive of the correlations between these scales in the survey. Third, we show that substantial variance across the 32 semantic axes lies on a low-dimensional subspace, reproducing patterns typical of human semantic associations. Finally, we demonstrate that steering a word on one semantic axis causes spillover effects on the model's rating of that word on other semantic scales proportionate to the cosine similarity between those semantic axes. These findings suggest that features should be understood not only in isolation but through their geometric relations and the meaningful subspaces they form.