AI Navigate

Follow the Clues, Frame the Truth: Hybrid-evidential Deductive Reasoning in Open-Vocabulary Multimodal Emotion Recognition

arXiv cs.AI / 3/18/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • HyDRA introduces Hybrid-evidential Deductive Reasoning Architecture for open-vocabulary multimodal emotion recognition to address ambiguity from diverse, unobserved situational cues.
  • It models inference as a Propose-Verify-Decide protocol and leverages reinforcement learning with hierarchical reward shaping to align reasoning trajectories with final task performance.
  • The approach yields improved performance over strong baselines, particularly in ambiguous or conflicting multimodal scenarios, with interpretable diagnostic evidence traces.
  • The framework emphasizes reconstructing nuanced emotional states by synthesizing multiple evidence-grounded rationales from diverse latent perspectives, moving beyond surface-level associations.
  • The paper provides systematic evaluations validating its design choices and interpretable reasoning traces, suggesting practical benefits for robust MER systems.

Abstract

Open-Vocabulary Multimodal Emotion Recognition (OV-MER) is inherently challenging due to the ambiguity of equivocal multimodal cues, which often stem from distinct unobserved situational dynamics. While Multimodal Large Language Models (MLLMs) offer extensive semantic coverage, their performance is often bottlenecked by premature commitment to dominant data priors, resulting in suboptimal heuristics that overlook crucial, complementary affective cues across modalities. We argue that effective affective reasoning requires more than surface-level association; it necessitates reconstructing nuanced emotional states by synthesizing multiple evidence-grounded rationales that reconcile these observations from diverse latent perspectives. We introduce HyDRA, a Hybrid-evidential Deductive Reasoning Architecture that formalizes inference as a Propose-Verify-Decide protocol. To internalize this abductive process, we employ reinforcement learning with hierarchical reward shaping, aligning the reasoning trajectories with final task performance to ensure they best reconcile the observed multimodal cues. Systematic evaluations validate our design choices, with HyDRA consistently outperforming strong baselines--especially in ambiguous or conflicting scenarios--while providing interpretable, diagnostic evidence traces.