Large language models show fragile cognitive reasoning about human emotions
arXiv cs.CL / 3/16/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The CoRE benchmark is introduced to probe implicit cognitive structures in LLMs for interpreting emotionally charged situations based on cognitive appraisal theory.
- The study finds LLMs capture systematic relations between cognitive appraisals and emotions but show misalignment with human judgments and instability across contexts.
- The evaluation includes alignment with human patterns, internal consistency, cross-model generalization, and robustness to contextual variation.
- The results highlight fragility in LLM-based emotion reasoning and have implications for affective computing research and how we evaluate AI emotion understanding.
Related Articles
State of MCP Security 2026: We Scanned 15,923 AI Tools. Here's What We Found.
Dev.to
Data Augmentation Using GANs
Dev.to
Building Safety Guardrails for LLM Customer Service That Actually Work in Production
Dev.to

The New AI Agent Primitive: Why Policy Needs Its Own Language (And Why YAML and Rego Fall Short)
Dev.to

The Digital Paralegal: Amplifying Legal Teams with a Copilot Co-Worker
Dev.to