Guaranteeing Knowledge Integration with Joint Decoding for Retrieval-Augmented Generation

arXiv cs.CL / 4/10/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper identifies an “integration bottleneck” in retrieval-augmented generation (RAG), where LLMs may retrieve relevant documents but still fail to use them due to conflicts with their parametric knowledge.
  • It proposes GuarantRAG, which decouples reasoning from evidence integration by generating an Inner-Answer from parametric knowledge and a Refer-Answer constrained by retrieved documents using a novel Contrastive DPO objective.
  • A joint decoding mechanism then fuses the Inner-Answer’s logical coherence with the Refer-Answer’s factual precision at the token level, rather than relying on naive concatenation.
  • Experiments across five QA benchmarks show improvements of up to 12.1% in accuracy and reductions in hallucinations by 16.3% versus standard and dynamic RAG baselines.
  • Overall, the work frames evidence integration as an explicit, train-and-decode problem rather than a retrieval-quality problem alone.

Abstract

Retrieval-Augmented Generation (RAG) significantly enhances Large Language Models (LLMs) by providing access to external knowledge. However, current research primarily focuses on retrieval quality, often overlooking the critical ''integration bottleneck'': even when relevant documents are retrieved, LLMs frequently fail to utilize them effectively due to conflicts with their internal parametric knowledge. In this paper, we argue that implicitly resolving this conflict in a single generation pass is suboptimal. We introduce GuarantRAG, a framework that explicitly decouples reasoning from evidence integration. First, we generate an ''Inner-Answer'' based solely on parametric knowledge to capture the model's reasoning flow. Second, to guarantee faithful evidence extraction, we generate a ''Refer-Answer'' using a novel Contrastive DPO objective. This objective treats the parametric Inner-Answer as a negative constraint and the retrieved documents as positive ground truth, forcing the model to suppress internal hallucinations in favor of external evidence during this phase. Finally, rather than naive concatenation or using the DPO trained model directly, we propose a joint decoding mechanism that dynamically fuses the logical coherence of the Inner-Answer with the factual precision of the Refer-Answer at the token level. Experiments on five QA benchmarks demonstrate that GuarantRAG improves accuracy by up to 12.1% and reduces hallucinations by 16.3% compared to standard and dynamic RAG baselines.