Unbiased Dynamic Multimodal Fusion

arXiv cs.CV / 3/23/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • UDML introduces a noise-aware uncertainty estimator that corrupts modality data with controlled noise and learns its intensity from the modality features to measure uncertainty across both low- and high-noise conditions.
  • It quantifies inherent modality reliance bias using modality dropout and incorporates this bias into the weighting mechanism to prevent penalizing hard-to-learn modalities.
  • The framework addresses drawbacks of prior dynamic fusion methods by removing assumptions of static modality quality and equal initial contributions, aiming for more robust fusion performance.
  • The authors validate UDML with extensive experiments on diverse multimodal benchmarks and provide the code at https://github.com/shicaiwei123/UDML.

Abstract

Traditional multimodal methods often assume static modality quality, which limits their adaptability in dynamic real-world scenarios. Thus, dynamical multimodal methods are proposed to assess modality quality and adjust their contribution accordingly. However, they typically rely on empirical metrics, failing to measure the modality quality when noise levels are extremely low or high. Moreover, existing methods usually assume that the initial contribution of each modality is the same, neglecting the intrinsic modality dependency bias. As a result, the modality hard to learn would be doubly penalized, and the performance of dynamical fusion could be inferior to that of static fusion. To address these challenges, we propose the Unbiased Dynamic Multimodal Learning (UDML) framework. Specifically, we introduce a noise-aware uncertainty estimator that adds controlled noise to the modality data and predicts its intensity from the modality feature. This forces the model to learn a clear correspondence between feature corruption and noise level, allowing accurate uncertainty measure across both low- and high-noise conditions. Furthermore, we quantify the inherent modality reliance bias within multimodal networks via modality dropout and incorporate it into the weighting mechanism. This eliminates the dual suppression effect on the hard-to-learn modality. Extensive experiments across diverse multimodal benchmark tasks validate the effectiveness, versatility, and generalizability of the proposed UDML. The code is available at https://github.com/shicaiwei123/UDML.