PAVE: Premise-Aware Validation and Editing for Retrieval-Augmented LLMs

arXiv cs.CL / 3/24/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces PAVE, an inference-time validation and editing layer for retrieval-augmented LLMs that verifies whether a drafted answer is supported by explicitly extracted premises.
  • PAVE decomposes retrieved context into question-conditioned atomic facts, generates an initial answer, scores support against the extracted premises, and revises outputs with low support before finalizing.
  • The method produces an auditable reasoning trace that includes explicit premises, support scores, and revision decisions rather than relying on implicit or uncheckable commitment.
  • In controlled ablation experiments with a fixed retriever and model backbone, PAVE improves evidence-grounded QA performance over simpler post-retrieval baselines, with the largest reported gain reaching 32.7 accuracy points on a span-grounded benchmark.

Abstract

Retrieval-augmented language models can retrieve relevant evidence yet still commit to answers before explicitly checking whether the retrieved context supports the conclusion. We present PAVE (Premise-Grounded Answer Validation and Editing), an inference-time validation layer for evidence-grounded question answering. PAVE decomposes retrieved context into question-conditioned atomic facts, drafts an answer, scores how well that draft is supported by the extracted premises, and revises low-support outputs before finalization. The resulting trace makes answer commitment auditable at the level of explicit premises, support scores, and revision decisions. In controlled ablations with a fixed retriever and backbone, PAVE outperforms simpler post-retrieval baselines in two evidence-grounded QA settings, with the largest gain reaching 32.7 accuracy points on a span-grounded benchmark. We view these findings as proof-of-concept evidence that explicit premise extraction plus support-gated revision can strengthen evidence-grounded consistency in retrieval-augmented LLM systems.