Stateful Evidence-Driven Retrieval-Augmented Generation with Iterative Reasoning

arXiv cs.CL / 4/17/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that standard retrieval-augmented generation (RAG) is often unstable because it uses flat context and stateless retrieval, which limits reliable question answering.
  • It proposes “Stateful Evidence-Driven RAG with Iterative Reasoning,” modeling QA as progressive evidence accumulation rather than a one-shot retrieval-and-generate flow.
  • Retrieved documents are transformed into structured reasoning units that include explicit relevance and confidence signals, and they are stored in a persistent evidence pool capturing both supporting and contradicting information.
  • The method performs deficiency and conflict analysis over the evidence, then iteratively refines queries to drive better subsequent retrieval and improve robustness to noisy results.
  • Experiments across multiple QA benchmarks show consistent gains versus standard RAG and multi-step baselines, with stable performance even under substantial retrieval noise.

Abstract

Retrieval-Augmented Generation (RAG) grounds Large Language Models (LLMs) in external knowledge but often suffers from flat context representations and stateless retrieval, leading to unstable performance. We propose Stateful Evidence-Driven RAG with Iterative Reasoning, a framework that models question answering as a progressive evidence accumulation process. Retrieved documents are converted into structured reasoning units with explicit relevance and confidence signals and maintained in a persistent evidence pool capturing both supportive and non-supportive information. The framework performs evidence-driven deficiency analysis to identify gaps and conflicts and iteratively refines queries to guide subsequent retrieval. This iterative reasoning process enables stable evidence aggregation and improves robustness to noisy retrieval. Experiments on multiple question answering benchmarks demonstrate consistent improvements over standard RAG and multi-step baselines, while effectively accumulating high-quality evidence and maintaining stable performance under substantial retrieval noise.