Retrieval as Generation: A Unified Framework with Self-Triggered Information Planning

arXiv cs.CL / 4/14/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper reframes retrieval-augmented generation (RAG) by integrating retrieval control directly into the token-level decoding process rather than using separate external controllers or classifiers.
  • It proposes GRIP (Generation-guided Retrieval with Information Planning), where the model emits control tokens to decide when to retrieve, how to reformulate queries, and when to stop within a single autoregressive trajectory.
  • The core mechanism, Self-Triggered Information Planning, tightly couples retrieval decisions with reasoning and supports dynamic multi-step inference with on-the-fly evidence integration.
  • The authors introduce structured supervision spanning answerable, partially answerable, and multi-hop query types, each mapped to specific token patterns for learning retrieval behavior.
  • Experiments on five QA benchmarks report that GRIP outperforms strong RAG baselines and is competitive with GPT-4o while using substantially fewer parameters.

Abstract

We revisit retrieval-augmented generation (RAG) by embedding retrieval control directly into generation. Instead of treating retrieval as an external intervention, we express retrieval decisions within token-level decoding, enabling end-to-end coordination without additional controllers or classifiers. Under the paradigm of Retrieval as Generation, we propose \textbf{GRIP} (\textbf{G}eneration-guided \textbf{R}etrieval with \textbf{I}nformation \textbf{P}lanning), a unified framework in which the model regulates retrieval behavior through control-token emission. Central to GRIP is \textit{Self-Triggered Information Planning}, which allows the model to decide when to retrieve, how to reformulate queries, and when to terminate, all within a single autoregressive trajectory. This design tightly couples retrieval and reasoning and supports dynamic multi-step inference with on-the-fly evidence integration. To supervise these behaviors, we construct a structured training set covering answerable, partially answerable, and multi-hop queries, each aligned with specific token patterns. Experiments on five QA benchmarks show that GRIP surpasses strong RAG baselines and is competitive with GPT-4o while using substantially fewer parameters.