Shared Emotion Geometry Across Small Language Models: A Cross-Architecture Study of Representation, Behavior, and Methodological Confounds

arXiv cs.CL / 4/14/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper analyzes 21-emotion vector representations extracted from twelve small language models across six architectures (base vs instruct) using a unified comprehension-mode pipeline at fp16 precision, then compares emotion “geometries” via representational similarity (cosine RDMs, Spearman correlations).
  • Results show a striking universality: five mature model families (Qwen 2.5, SmolLM2, Llama 3.2, Mistral 7B, Llama 3.1) share nearly identical 21-emotion geometry, with pairwise RDM Spearman correlations typically in the 0.74–0.92 range.
  • The universality holds even when models differ strongly in behavior (e.g., Qwen 2.5 vs Llama 3.2 differ on MTI compliance facets but still show highly similar emotion representations, rho ≈ 0.81), implying behavioral differences may occur “above” a shared emotional representation layer.
  • The outlier Gemma-3 1B base shows extreme representation issues (residual-stream anisotropy ~0.997) and gets restructured by RLHF across geometric descriptors, while within-mature families base vs instruct RDMs correlate very highly (e.g., Mistral 7B v0.3 rho ≈ 0.985), suggesting RLHF mainly reshapes representations that are not yet organized.
  • Methodologically, the authors argue prior “method effect” conclusions actually combine multiple factors—method-dependent dissociation, sub-parameter sensitivity within generation, fp16 vs INT8 precision effects, and cross-experiment bias—so single cross-study similarity numbers (single rho) may be misleading without decomposition.

Abstract

We extract 21-emotion vector sets from twelve small language models (six architectures x base/instruct, 1B-8B parameters) under a unified comprehension-mode pipeline at fp16 precision, and compare the resulting geometries via representational similarity analysis on raw cosine RDMs. The five mature architectures (Qwen 2.5 1.5B, SmolLM2 1.7B, Llama 3.2 3B, Mistral 7B v0.3, Llama 3.1 8B) share nearly identical 21-emotion geometry, with pairwise RDM Spearman correlations of 0.74-0.92. This universality persists across diametrically opposed behavioral profiles: Qwen 2.5 and Llama 3.2 occupy opposite poles of MTI Compliance facets yet produce nearly identical emotion RDMs (rho = 0.81), so behavioral facet differences arise above the shared emotion representation. Gemma-3 1B base, the one immature case in our dataset, exhibits extreme residual-stream anisotropy (0.997) and is restructured by RLHF across all geometric descriptors, whereas the five already-mature families show within-family base x instruct RDM correlations of rho >= 0.92 (Mistral 7B v0.3 at rho = 0.985), suggesting RLHF restructures only representations that are not yet organized. Methodologically, we show that what prior work has read as a single comprehension-vs-generation method effect in fact decomposes into four distinct layers -- a coarse method-dependent dissociation, robust sub-parameter sensitivity within generation, a true precision (fp16 vs INT8) effect, and a conflated cross-experiment bias that distorts in opposite directions for different models -- so that a single rho between two prior emotion-vector studies is not a safe basis for interpretation without the layered decomposition.