Valence-Arousal Subspace in LLMs: Circular Emotion Geometry and Multi-Behavioral Control
arXiv cs.CL / 4/6/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper proposes a way to discover a valence–arousal (VA) subspace in large language model representations by learning emotion steering vectors from 211k emotion-labeled texts and fitting VA axes via ridge regression on self-reported VA scores.
- It reports that projections onto the learned VA subspace align with human-crowdsourced VA ratings across 44k lexical items and that steering along these axes yields monotonic changes in the model’s affective behavior.
- The method also achieves near-monotonic, bidirectional control over refusal and sycophancy, where increasing arousal decreases refusal and increases sycophancy (and reversing arousal flips the effects).
- Experiments reportedly generalize across multiple architectures (Llama-3.1-8B, Qwen3-8B, and Qwen3-14B) and include a mechanistic explanation tied to refusal-associated tokens occupying low-arousal/negative-valence regions.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.




