RADD: Retrieval-Augmented Discrete Diffusion for Multi-Modal Knowledge Graph Completion

arXiv cs.AI / 4/29/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that many multi-modal knowledge graph completion (MMKGC) approaches overly couple global retrieval and local reranking using a single embedding scorer, limiting performance.
It introduces RADD (Retrieval-Augmented Discrete Diffusion), which decouples retrieval from reranking by using a relation-aware multimodal KGE retriever for global search and distillation.
A conditional discrete denoiser generates shortlist-level entity identity candidates, and training jointly uses KGE supervision, denoising cross-entropy, and temperature-scaled distillation.
During inference, the Diff-Rerank process retrieves a top-K shortlist first (to guarantee high recall) and then reranks with the denoiser (to improve precision), a design validated by experiments and ablations across three benchmarks.
Experiments show RADD achieves the best results and consistent improvements over unimodal, multimodal, and LLM-based baselines on multiple MMKGC datasets.

Abstract

Most multi-modal knowledge graph completion (MMKGC) models use one embedding scorer to do both retrieval over the full entity set and final decision making. We argue that this coupling is a core bottleneck: global high-recall search and local fine-grained disambiguation require different inductive biases. Therefore, we propose a Retrieval-Augmented Discrete Diffusion (RADD) framework to decouple retrieve and reranking for MMKGC. A relation-aware multimodal KGE retriever serves as both global retriever and distillation teacher, while a conditional discrete denoiser performs shortlist-level entity-identity generation for reranking. Training combines KGE supervision, denoising cross-entropy, and temperature-scaled distillation from the retriever to the denoiser. At inference, the designed Diff-Rerank first forms a top-

K

shortlist with the retriever and then reranks it with the denoiser, ensuring that recall is a strict prerequisite for precision. Experiments on three MMKGC benchmarks show that RADD achieves the best performance and consistent gains over strong unimodal, multimodal, and LLM-based baselines, while ablations further verify the contribution of each component.

What to Build Still Beats How

Dev.to

From Claim Denials to Smart Decisions: My Experience Using AI in Healthcare Claims Processing

Dev.to

v0.22.1

Ollama Releases

AI created job descriptions

Reddit r/artificial

Predictive Compliance: How AI Identifies Your Med Spa's Documentation Risks

Dev.to

RADD: Retrieval-Augmented Discrete Diffusion for Multi-Modal Knowledge Graph Completion

Key Points

Abstract

Related Articles

What to Build Still Beats How

From Claim Denials to Smart Decisions: My Experience Using AI in Healthcare Claims Processing

v0.22.1

AI created job descriptions

Predictive Compliance: How AI Identifies Your Med Spa's Documentation Risks

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer