CAR: Query-Guided Confidence-Aware Reranking for Retrieval-Augmented Generation

arXiv cs.AI / 5/7/2026

💬 OpinionTools & Practical UsageModels & Research

Key Points

  • The paper argues that RAG reranking should optimize for generation usefulness (e.g., uncertainty reduction), not just query–document relevance.
  • It introduces CAR (Confidence-Aware Reranking), a training-free, plug-and-play method that uses the generator’s confidence change to score and reorder documents.
  • CAR estimates confidence by measuring semantic consistency across multiple sampled answers under query-only versus query+document settings, promoting documents that raise confidence and demoting those that lower it.
  • Experiments on four BEIR datasets show consistent improvements in NDCG@5 across both sparse/dense retrievers and multiple reranker types and LLM backbones.
  • The method’s ranking improvements closely align with downstream generation quality, with a strong correlation to generation F1 (Spearman ρ = 0.964), including a large gain for the YesNo reranker.

Abstract

Retrieval-Augmented Generation (RAG) depends on document ranking to provide useful evidence for generation, but conventional reranking methods mainly optimize query-document relevance rather than generation usefulness. A relevant document may still introduce noise, while a lower-ranked document may better reduce the generator's uncertainty. We propose CAR (Confidence-Aware Reranking), a query-guided, training-free, and plug-and-play reranking framework that uses generator confidence change as a document usefulness signal. CAR estimates confidence through the semantic consistency of multiple sampled answers under query-only and query-document conditions. Documents that significantly increase confidence are promoted, those that decrease confidence are demoted, and uncertain cases preserve the baseline order, while a query-level gate avoids unnecessary intervention on already confident queries. Experiments on four BEIR datasets show that CAR consistently improves NDCG@5 across sparse and dense retrievers, LLM-based and supervised rerankers, and four LLM backbones. Notably, CAR improves the YesNo reranker by 25.4 percent on average under Contriever retrieval, and its ranking gains strongly correlate with downstream generation F1 improvements, achieving Spearman rho = 0.964.