When Modalities Remember: Continual Learning for Multimodal Knowledge Graphs

arXiv cs.CL / 4/6/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies continual multimodal knowledge graph reasoning (CMMKGR) to handle real-world MMKGs that evolve with new entities, relations, and multimodal evidence over time.
  • It introduces MRCKG, which uses a multimodal-structural collaborative curriculum to progressively learn new triples based on both how they connect structurally to the historical graph and how well they match multimodal compatibility.
  • MRCKG adds a cross-modal knowledge preservation mechanism aimed at reducing catastrophic forgetting by stabilizing entity representations, maintaining relational semantic consistency, and anchoring modalities.
  • The method further uses a multimodal contrastive replay scheme with a two-stage optimization process to reinforce previously learned knowledge through multimodal importance sampling and representation alignment.
  • Experiments across multiple datasets indicate that MRCKG both retains earlier multimodal knowledge and substantially improves learning of newly added knowledge.

Abstract

Real-world multimodal knowledge graphs (MMKGs) are dynamic, with new entities, relations, and multimodal knowledge emerging over time. Existing continual knowledge graph reasoning (CKGR) methods focus on structural triples and cannot fully exploit multimodal signals from new entities. Existing multimodal knowledge graph reasoning (MMKGR) methods, however, usually assume static graphs and suffer catastrophic forgetting as graphs evolve. To address this gap, we present a systematic study of continual multimodal knowledge graph reasoning (CMMKGR). We construct several continual multimodal knowledge graph benchmarks from existing MMKG datasets and propose MRCKG, a new CMMKGR model. Specifically, MRCKG employs a multimodal-structural collaborative curriculum to schedule progressive learning based on the structural connectivity of new triples to the historical graph and their multimodal compatibility. It also introduces a cross-modal knowledge preservation mechanism to mitigate forgetting through entity representation stability, relational semantic consistency, and modality anchoring. In addition, a multimodal contrastive replay scheme with a two-stage optimization strategy reinforces learned knowledge via multimodal importance sampling and representation alignment. Experiments on multiple datasets show that MRCKG preserves previously learned multimodal knowledge while substantially improving the learning of new knowledge.