Large language models show fragile cognitive reasoning about human emotions

arXiv cs.CL / 3/16/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The CoRE benchmark is introduced to probe implicit cognitive structures in LLMs for interpreting emotionally charged situations based on cognitive appraisal theory.
The study finds LLMs capture systematic relations between cognitive appraisals and emotions but show misalignment with human judgments and instability across contexts.
The evaluation includes alignment with human patterns, internal consistency, cross-model generalization, and robustness to contextual variation.
The results highlight fragility in LLM-based emotion reasoning and have implications for affective computing research and how we evaluate AI emotion understanding.

Abstract

Affective computing seeks to support the holistic development of artificial intelligence by enabling machines to engage with human emotion. Recent foundation models, particularly large language models (LLMs), have been trained and evaluated on emotion-related tasks, typically using supervised learning with discrete emotion labels. Such evaluations largely focus on surface phenomena, such as recognizing expressed or evoked emotions, leaving open whether these systems reason about emotion in cognitively meaningful ways. Here we ask whether LLMs can reason about emotions through underlying cognitive dimensions rather than labels alone. Drawing on cognitive appraisal theory, we introduce CoRE, a large-scale benchmark designed to probe the implicit cognitive structures LLMs use when interpreting emotionally charged situations. We assess alignment with human appraisal patterns, internal consistency, cross-model generalization, and robustness to contextual variation. We find that LLMs capture systematic relations between cognitive appraisals and emotions but show misalignment with human judgments and instability across contexts.

ADICはどの種類の革新なのか ―― ドリフト監査デモで見る「事後説明」から「通過条件」への移行**

Qiita

Complete Guide: How To Make Money With Ai

Dev.to

Built a small free iOS app to reduce LLM answer uncertainty with multiple models

Dev.to

Without Valid Data, AI Transformation Is Flying Blind – Why We Need to “Grasp” Work Again

Dev.to

How We Used Hindsight Memory to Build an AI That Knows Your Weaknesses

Dev.to

Large language models show fragile cognitive reasoning about human emotions

Key Points

Abstract

Related Articles

ADICはどの種類の革新なのか ―― ドリフト監査デモで見る「事後説明」から「通過条件」への移行**

Complete Guide: How To Make Money With Ai

Built a small free iOS app to reduce LLM answer uncertainty with multiple models

Without Valid Data, AI Transformation Is Flying Blind – Why We Need to “Grasp” Work Again

How We Used Hindsight Memory to Build an AI That Knows Your Weaknesses

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer