EmoTrans: A Benchmark for Understanding, Reasoning, and Predicting Emotion Transitions in Multimodal LLMs
arXiv cs.CV / 4/28/2026
📰 NewsSignals & Early TrendsModels & Research
Key Points
- The paper introduces EmoTrans, a new benchmark designed to evaluate how multimodal LLMs understand emotion as a dynamic process rather than static emotion recognition.
- EmoTrans includes 1,000 manually annotated multimodal video clips across 12 real-world scenarios, along with 3,000+ task-specific QA pairs for fine-grained assessment.
- It defines four progressively challenging tasks—Emotion Change Detection, Emotion State Identification, Emotion Transition Reasoning, and Next Emotion Prediction—to test detection, reasoning, and forecasting of emotion transitions.
- Evaluations on 18 state-of-the-art MLLMs show stronger performance on coarse change detection but persistent difficulty in fine-grained emotion-dynamics modeling, with multi-person social contexts remaining particularly challenging.
- The authors publicly release the benchmark, evaluation protocol, and code to support future research, including at the provided GitHub repository.


