AffectAgent: Collaborative Multi-Agent Reasoning for Retrieval-Augmented Multimodal Emotion Recognition
arXiv cs.CV / 4/15/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- AffectAgent is a multi-agent retrieval-augmented multimodal emotion recognition framework designed to reduce hallucinations and better capture nuanced affective states across modalities.
- The system uses three specialized, jointly optimized agents—a query planner, an evidence filter, and an emotion generator—to retrieve cross-modal evidence, assess it, and produce emotion predictions.
- AffectAgent is end-to-end trained with Multi-Agent Proximal Policy Optimization (MAPPO) using a shared affective reward to align the agents’ collaborative reasoning.
- It introduces Modality-Balancing Mixture of Experts (MB-MoE) to dynamically weight modalities and mitigate cross-modal representation mismatches, and Retrieval-Augmented Adaptive Fusion (RAAF) to improve predictions when a modality is missing.
- Experiments on MER-UniBench report that AffectAgent achieves stronger performance than prior approaches, and the authors plan to release the code publicly.


