RADD: Retrieval-Augmented Discrete Diffusion for Multi-Modal Knowledge Graph Completion

arXiv cs.AI / 4/29/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that many multi-modal knowledge graph completion (MMKGC) approaches overly couple global retrieval and local reranking using a single embedding scorer, limiting performance.
  • It introduces RADD (Retrieval-Augmented Discrete Diffusion), which decouples retrieval from reranking by using a relation-aware multimodal KGE retriever for global search and distillation.
  • A conditional discrete denoiser generates shortlist-level entity identity candidates, and training jointly uses KGE supervision, denoising cross-entropy, and temperature-scaled distillation.
  • During inference, the Diff-Rerank process retrieves a top-K shortlist first (to guarantee high recall) and then reranks with the denoiser (to improve precision), a design validated by experiments and ablations across three benchmarks.
  • Experiments show RADD achieves the best results and consistent improvements over unimodal, multimodal, and LLM-based baselines on multiple MMKGC datasets.

Abstract

Most multi-modal knowledge graph completion (MMKGC) models use one embedding scorer to do both retrieval over the full entity set and final decision making. We argue that this coupling is a core bottleneck: global high-recall search and local fine-grained disambiguation require different inductive biases. Therefore, we propose a Retrieval-Augmented Discrete Diffusion (RADD) framework to decouple retrieve and reranking for MMKGC. A relation-aware multimodal KGE retriever serves as both global retriever and distillation teacher, while a conditional discrete denoiser performs shortlist-level entity-identity generation for reranking. Training combines KGE supervision, denoising cross-entropy, and temperature-scaled distillation from the retriever to the denoiser. At inference, the designed Diff-Rerank first forms a top-K shortlist with the retriever and then reranks it with the denoiser, ensuring that recall is a strict prerequisite for precision. Experiments on three MMKGC benchmarks show that RADD achieves the best performance and consistent gains over strong unimodal, multimodal, and LLM-based baselines, while ablations further verify the contribution of each component.