TMTE: Effective Multimodal Graph Learning with Task-aware Modality and Topology Co-evolution

arXiv cs.LG / 3/31/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper identifies quality limitations in real-world multimodal-attributed graphs (MAGs), including noisy interactions, missing connections, and task-agnostic relational structures that reduce transfer across tasks.
  • It proposes TMTE (Task-aware Modality and Topology co-Evolution), a closed-loop multimodal graph learning framework that jointly and iteratively optimizes both graph topology and multimodal representations for a specific target task.
  • TMTE models topology evolution as multi-perspective metric learning over modality embeddings using an anchor-based approximation, while modality evolution uses smoothness-regularized fusion with cross-modal alignment.
  • Experiments across 9 MAG datasets (plus 1 non-graph multimodal dataset) and 6 graph-centric/modality-centric tasks show consistent state-of-the-art performance gains.
  • The authors provide code publicly (link in the paper) to support reproduction and further development of the TMTE approach.

Abstract

Multimodal-attributed graphs (MAGs) are a fundamental data structure for multimodal graph learning (MGL), enabling both graph-centric and modality-centric tasks. However, our empirical analysis reveals inherent topology quality limitations in real-world MAGs, including noisy interactions, missing connections, and task-agnostic relational structures. A single graph derived from generic relationships is therefore unlikely to be universally optimal for diverse downstream tasks. To address this challenge, we propose Task-aware Modality and Topology co-Evolution (TMTE), a novel MGL framework that jointly and iteratively optimizes graph topology and multimodal representations toward the target task. TMTE is motivated by the bidirectional coupling between modality and topology: multimodal attributes induce relational structures, while graph topology shapes modality representations. Concretely, TMTE casts topology evolution as multi-perspective metric learning over modality embeddings with an anchor-based approximation, and formulates modality evolution as smoothness-regularized fusion with cross-modal alignment, yielding a closed-loop task-aware co-evolution process. Extensive experiments on 9 MAG datasets and 1 non-graph multimodal dataset across 6 graph-centric and modality-centric tasks show that TMTE consistently achieves state-of-the-art performance. Our code is available at https://anonymous.4open.science/r/TMTE-1873.