Harmonized Tabular-Image Fusion via Gradient-Aligned Alternating Learning

arXiv cs.CV / 4/3/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper targets multimodal tabular-image fusion, arguing that existing approaches can be derailed by gradient conflicts between modalities during optimization.
  • It introduces a Gradient-Aligned Alternating Learning (GAAL) training paradigm that alternates unimodal learning with a shared classifier to better decouple and coordinate multimodal gradients.
  • GAAL further uses uncertainty-based cross-modal gradient surgery to selectively align gradients coming from different modalities, aiming to steer shared parameters in a way that benefits all modalities.
  • Experiments on common benchmark datasets report improved fusion performance over multiple state-of-the-art baselines, including comparisons to test-time tabular-missing scenarios.
  • The authors provide publicly available source code to support reproduction and further experimentation.

Abstract

Multimodal tabular-image fusion is an emerging task that has received increasing attention in various domains. However, existing methods may be hindered by gradient conflicts between modalities, misleading the optimization of the unimodal learner. In this paper, we propose a novel Gradient-Aligned Alternating Learning (GAAL) paradigm to address this issue by aligning modality gradients. Specifically, GAAL adopts an alternating unimodal learning and shared classifier to decouple the multimodal gradient and facilitate interaction. Furthermore, we design uncertainty-based cross-modal gradient surgery to selectively align cross-modal gradients, thereby steering the shared parameters to benefit all modalities. As a result, GAAL can provide effective unimodal assistance and help boost the overall fusion performance. Empirical experiments on widely used datasets reveal the superiority of our method through comparison with various state-of-the-art (SoTA) tabular-image fusion baselines and test-time tabular missing baselines. The source code is available at https://github.com/njustkmg/ICME26-GAAL.