EmoMM: Benchmarking and Steering MLLM for Multimodal Emotion Recognition under Conflict and Missingness
arXiv cs.CV / 5/5/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces EmoMM, a benchmark for Multimodal Emotion Recognition that explicitly includes modality-aligned, modality-conflict, and missing-modality subsets to study MLLM behavior in realistic conditions.
- Extensive experiments reveal a “Video Contribution Collapse (VCC)” phenomenon in which MLLMs often downplay video evidence when token redundancy is high and modality preferences skew decisions.
- To mitigate this without retraining, the authors propose CHASE (Conflict-aware Head-level Attention Steering), an inference-time, lightweight attention steering method that detects modality conflicts and reduces decision bias.
- Results show CHASE improves performance across multiple settings, making MLLM-based emotion recognition more reliable in complex affective scenarios involving conflicts and missing inputs.
Related Articles

Why Retail Chargeback Recovery Could Be AgentHansa's First Real PMF
Dev.to

Why B2B Revenue-Recovery Casework Looks Like AgentHansa's Best Early PMF
Dev.to

10 Ways AI Has Become Your Invisible Daily Companion in 2026
Dev.to

When a Bottling Line Stops at 2 A.M., the Agent That Wins Is the One That Finds the Right Replacement Part
Dev.to

My ‘Busy’ Button Is a Chat Window: 8 Hours of Sorting & Broccoli Poetry
Dev.to