StratRAG: A Multi-Hop Retrieval Evaluation Dataset for Retrieval-Augmented Generation Systems

arXiv cs.AI / 4/28/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • StratRAG is an open-source evaluation dataset designed to benchmark Retrieval-Augmented Generation (RAG) systems on multi-hop reasoning under realistic, noisy document-pool conditions.
  • The dataset contains 2,200 examples derived from HotpotQA (distractor setting), covering three question types (bridge, comparison, yes-no) with pools of 15 candidate documents that include exactly 2 gold documents plus 13 topical distractors.
  • The authors evaluate three retrieval strategies—BM25, dense retrieval using all-MiniLM-L6-v2, and hybrid fusion—using metrics such as Recall@k, MRR, and NDCG@5.
  • Hybrid retrieval delivers the best overall results (Recall@2 = 0.70, MRR = 0.93), but bridge questions remain more challenging (Recall@2 = 0.67), suggesting a need for improved retrieval policies.
  • StratRAG is publicly available on Hugging Face for the research community to use and reproduce results.

Abstract

We introduce StratRAG, an open-source retrieval evaluation dataset for benchmarking Retrieval-Augmented Generation (RAG) systems on multi-hop reasoning tasks under realistic, noisy document-pool conditions. Derived from HotpotQA (distractor setting), StratRAG comprises 2,200 examples across three question types -- bridge, comparison, and yes-no -- each paired with a pool of 15 candidate documents containing exactly 2 gold documents and 13 topically related distractors. We benchmark three retrieval strategies -- BM25, dense retrieval (all-MiniLM-L6-v2), and hybrid fusion -- reporting Recall@k, MRR, and NDCG@5 on the validation set. Hybrid retrieval achieves the best overall performance (Recall@2 = 0.70, MRR = 0.93), yet bridge questions remain substantially harder (Recall@2 = 0.67), motivating future work on reinforcement-learning-based retrieval policies. StratRAG is publicly available at https://huggingface.co/datasets/Aryanp088/StratRAG.

StratRAG: A Multi-Hop Retrieval Evaluation Dataset for Retrieval-Augmented Generation Systems | AI Navigate