From Syntax to Emotion: A Mechanistic Analysis of Emotion Inference in LLMs

arXiv cs.CL / 4/29/2026

📰 NewsModels & Research

共有:

Key Points

The paper uses sparse autoencoders to probe how LLMs internally represent emotion recognition, finding a consistent three-phase information flow where emotion-relevant features appear only in the final phase.
It shows that emotion representations are built from both shared features across emotions and emotion-specific features, with different emotions relying on different causal mechanisms.
Phase-stratified causal tracing identifies a small set of influential features that drive emotion predictions, and the number and causal impact of these features vary by emotion—disgust appears more weakly and diffusely represented.
The authors propose a causal feature steering method that is interpretable and data-efficient, improving emotion recognition performance across multiple models while largely preserving language modeling ability, and the gains generalize across multiple emotion datasets.
Overall, the work offers a systematic mechanistic account of emotion inference in LLMs and a practical, controllable intervention for boosting performance in emotionally sensitive applications.

Abstract

Large language models (LLMs) are increasingly used in emotionally sensitive human-AI applications, yet little is known about how emotion recognition is internally represented. In this work, we investigate the internal mechanisms of emotion recognition in LLMs using sparse autoencoders (SAEs). By analyzing sparse feature activations across layers, we identify a consistent three-phase information flow, in which emotion-related features emerge only in the final phase. We further show that emotion representations comprise both shared features across emotions and emotion-specific features. Using phase-stratified causal tracing, we identify a small set of features that strongly influence emotion predictions, and show that both their number and causal impact vary across emotions; in particular, Disgust is more weakly and diffusely represented than other emotions. Finally, we propose an interpretable and data-efficient causal feature steering method that significantly improves emotion recognition performance across multiple models while largely preserving language modeling ability, and demonstrate that these improvements generalize across multiple emotion recognition datasets. Overall, our findings provide a systematic analysis of the internal mechanisms underlying emotion recognition in LLMs and introduce an efficient, interpretable, and controllable approach for improving model performance.

How I Use AI Agents to Maintain a Living Knowledge Base for My Team

Dev.to

IK_LLAMA now supports Qwen3.5 MTP Support :O

Reddit r/LocalLLaMA

OpenAI models, Codex, and Managed Agents come to AWS

Dev.to

Automatic Error Recovery in AI Agent Networks

Dev.to

AeroJAX: JAX-native CFD, differentiable end-to-end. ~560 FPS at 128x128 on CPU [P]

Reddit r/MachineLearning

From Syntax to Emotion: A Mechanistic Analysis of Emotion Inference in LLMs

Key Points

Abstract

Related Articles

How I Use AI Agents to Maintain a Living Knowledge Base for My Team

IK_LLAMA now supports Qwen3.5 MTP Support :O

OpenAI models, Codex, and Managed Agents come to AWS

Automatic Error Recovery in AI Agent Networks

AeroJAX: JAX-native CFD, differentiable end-to-end. ~560 FPS at 128x128 on CPU [P]

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer