Enhance-then-Balance Modality Collaboration for Robust Multimodal Sentiment Analysis

arXiv cs.CL / 4/15/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper targets robustness issues in multimodal sentiment analysis caused by modality imbalance, where dominant text/audio/visual signals can overwhelm weaker modalities and degrade fusion quality.
  • It introduces the Enhance-then-Balance Modality Collaboration (EBMC) framework, which uses semantic disentanglement and cross-modal enhancement to improve representation quality for weaker modalities.
  • To mitigate dominance effects, EBMC adds an Energy-guided Modality Coordination mechanism that performs differentiable implicit gradient rebalancing through a equilibrium objective.
  • It further improves robustness under noisy or missing modalities with Instance-aware Modality Trust Distillation, which estimates sample-level reliability to adapt fusion weights.
  • Experiments report state-of-the-art or competitive multimodal sentiment results and strong performance specifically in missing-modality scenarios.

Abstract

Multimodal sentiment analysis (MSA) integrates heterogeneous text, audio, and visual signals to infer human emotions. While recent approaches leverage cross-modal complementarity, they often struggle to fully utilize weaker modalities. In practice, dominant modalities tend to overshadow non-verbal ones, inducing modality competition and limiting overall contributions. This imbalance degrades fusion performance and robustness under noisy or missing modalities. To address this, we propose a novel model, Enhance-then-Balance Modality Collaboration framework (EBMC). EBMC improves representation quality via semantic disentanglement and cross-modal enhancement, strengthening weaker modalities. To prevent dominant modalities from overwhelming others, an Energy-guided Modality Coordination mechanism achieves implicit gradient rebalancing via a differentiable equilibrium objective. Furthermore, Instance-aware Modality Trust Distillation estimates sample-level reliability to adaptively modulate fusion weights, ensuring robustness. Extensive experiments demonstrate that EBMC achieves state-of-the-art or competitive results and maintains strong performance under missing-modality settings.