Can QPP Choose the Right Query Variant? Evaluating Query Variant Selection for RAG Pipelines

arXiv cs.CL / 4/27/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper studies whether Query Performance Prediction (QPP) can select the best query reformulation variant in RAG pipelines without running full retrieval and generation for every variant.
Unlike traditional QPP that estimates query difficulty across topics, it focuses on intra-topic discrimination by choosing among multiple semantically equivalent variants for the same information need.
Experiments on TREC-RAG show a “utility gap”: variants that score well on retrieval ranking metrics (e.g., nDCG) do not necessarily yield the best generated answers.
Despite this divergence, QPP can reliably pick variants that improve end-to-end answer quality, and lightweight pre-retrieval predictors often achieve similar or better results than costly post-retrieval methods.
Overall, the findings support latency-efficient variant selection to make RAG more computationally affordable while maintaining output quality.

Abstract

Large Language Models (LLMs) have made query reformulation ubiquitous in modern retrieval and Retrieval-Augmented Generation (RAG) pipelines, enabling the generation of multiple semantically equivalent query variants. However, executing the full pipeline for every reformulation is computationally expensive, motivating selective execution: can we identify the best query variant before incurring downstream retrieval and generation costs? We investigate Query Performance Prediction (QPP) as a mechanism for variant selection across ad-hoc retrieval and end-to-end RAG. Unlike traditional QPP, which estimates query difficulty across topics, we study intra-topic discrimination - selecting the optimal reformulation among competing variants of the same information need. Through large-scale experiments on TREC-RAG using both sparse and dense retrievers, we evaluate pre- and post-retrieval predictors under correlation- and decision-based metrics. Our results reveal a systematic divergence between retrieval and generation objectives: variants that maximize ranking metrics such as nDCG often fail to produce the best generated answers, exposing a "utility gap" between retrieval relevance and generation fidelity. Nevertheless, QPP can reliably identify variants that improve end-to-end quality over the original query. Notably, lightweight pre-retrieval predictors frequently match or outperform more expensive post-retrieval methods, offering a latency-efficient approach to robust RAG.

Legal Insight Transformation: 7 Mistakes to Avoid When Adopting AI Tools

Dev.to

Legal Insight Transformation: Traditional vs. AI-Driven Research Compared

Dev.to

Legal Insight Transformation: A Beginner's Guide to Modern Research

Dev.to

I tested the same prompt across multiple AI models… the differences surprised me

Reddit r/artificial

The five loops between AI coding and AI engineering

Dev.to

Can QPP Choose the Right Query Variant? Evaluating Query Variant Selection for RAG Pipelines

Key Points

Abstract

Related Articles

Legal Insight Transformation: 7 Mistakes to Avoid When Adopting AI Tools

Legal Insight Transformation: Traditional vs. AI-Driven Research Compared

Legal Insight Transformation: A Beginner's Guide to Modern Research

I tested the same prompt across multiple AI models… the differences surprised me

The five loops between AI coding and AI engineering

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer