IndexRAG: Bridging Facts for Cross-Document Reasoning at Index Time

arXiv cs.CL / 3/18/2026

📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

共有:

Key Points

IndexRAG shifts cross-document reasoning from online inference to offline indexing by identifying bridge entities across documents and generating bridging facts as independently retrievable units.
The approach requires no additional training or fine-tuning, enabling a single-pass retrieval with a single LLM call at inference time.
Experiments on HotpotQA, 2WikiMultiHopQA, and MuSiQue show an average F1 improvement of 4.6 points over Naive RAG.
When combined with IRCoT, IndexRAG outperforms graph-based baselines like HippoRAG and FastGraphRAG while relying on flat retrieval, with code to be released upon acceptance.

Abstract

Multi-hop question answering (QA) requires reasoning across multiple documents, yet existing retrieval-augmented generation (RAG) approaches address this either through graph-based methods requiring additional online processing or iterative multi-step reasoning. We present IndexRAG, a novel approach that shifts cross-document reasoning from online inference to offline indexing. IndexRAG identifies bridge entities shared across documents and generates bridging facts as independently retrievable units, requiring no additional training or fine-tuning. Experiments on three widely-used multi-hop QA benchmarks (HotpotQA, 2WikiMultiHopQA, MuSiQue) show that IndexRAG improves F1 over Naive RAG by 4.6 points on average, while requiring only single-pass retrieval and a single LLM call at inference time. When combined with IRCoT, IndexRAG outperforms all graph-based baselines on average, including HippoRAG and FastGraphRAG, while relying solely on flat retrieval. Our code will be released upon acceptance.