SRAG: RAG with Structured Data Improves Vector Retrieval

arXiv cs.CL / 3/31/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that standard RAG relies heavily on embedding-based representational similarity and can underperform when queries and chunks require more than vector similarity for accurate grounding.
  • It proposes Structured RAG (SRAG), which enriches both queries and retrieved chunks with structured signals such as topics, sentiments, query/chunk types, knowledge-graph triples, and semantic tags.
  • Experiments show SRAG significantly improves the retrieval process, indicating better alignment between user intent and the information retrieved.
  • Using GPT-5 as an LLM-as-a-judge in a question-answering setting, SRAG improves answer scoring by about 30% (p-value = 2e-13), with the largest gains on comparative, analytical, and predictive questions.
  • The authors report that SRAG supports broader, more diverse, and episodic-style retrieval while also improving tail-risk outcomes (more frequent large gains with relatively small losses).

Abstract

Retrieval Augmented Generation (RAG) provides the necessary informational grounding to LLMs in the form of chunks retrieved from a vector database or through web search. RAG could also use knowledge graph triples as a means of providing factual information to an LLM. However, the retrieval is only based on representational similarity between a question and the contents. The performance of RAG depends on the numeric vector representations of the query and the chunks. To improve these representations, we propose Structured RAG (SRAG), which adds structured information to a query as well as the chunks in the form of topics, sentiments, query and chunk types (e.g., informational, quantitative), knowledge graph triples and semantic tags. Experiments indicate that this method significantly improves the retrieval process. Using GPT-5 as an LLM-as-a-judge, results show that the method improves the score given to answers in a question answering system by 30% (p-value = 2e-13) (with tighter bounds). The strongest improvement is in comparative, analytical and predictive questions. The results suggest that our method enables broader, more diverse, and episodic-style retrieval. Tail risk analysis shows that SRAG attains very large gains more often, with losses remaining minor in magnitude.