Aligning Dense Retrievers with LLM Utility via DistillationAligning Dense Retrievers with LLM Utility via Distillation

arXiv cs.AI / 4/27/2026

💬 OpinionDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper proposes UAE (Utility-Aligned Embeddings) to improve dense vector retrieval for RAG by aligning embedding similarity with an LLM’s retrieval utility signals.
  • UAE trains a bi-encoder using a distribution-matching formulation and a Utility-Modulated InfoNCE objective that derives graded utility from perplexity reduction, avoiding any test-time LLM re-ranking.
  • By injecting utility signals directly into the embedding space, UAE aims to overcome precision limitations of pure similarity search while sidestepping the computational cost and noise of LLM-based re-ranking.
  • Experiments on the QASPER benchmark show large gains over a strong semantic baseline (BGE-Base), including +30.59% Recall@1, +30.16% MAP, and +17.3% Token F1.
  • UAE is reported to be over 180× faster than efficient LLM re-ranking methods while maintaining competitive quality, enabling scalable retrieval for RAG systems.

Abstract

Dense vector retrieval is the practical backbone of Retrieval- Augmented Generation (RAG), but similarity search can suffer from precision limitations. Conversely, utility-based approaches leveraging LLM re-ranking often achieve superior performance but are computationally prohibitive and prone to noise inherent in perplexity estimation. We propose Utility-Aligned Embeddings (UAE), a framework designed to merge these advantages into a practical, high-performance retrieval method. We formulate retrieval as a distribution matching problem, training a bi-encoder to imitate a utility distribution derived from perplexity reduction using a Utility-Modulated InfoNCE objective. This approach injects graded utility signals directly into the embedding space without requiring test-time LLM inference. On the QASPER benchmark, UAE improves retrieval Recall@1 by 30.59%, MAP by 30.16% and Token F1 by 17.3% over the strong semantic baseline BGE-Base. Crucially, UAE is over 180x faster than the efficient LLM re-ranking methods preserving competitive performance, demonstrating that aligning retrieval with generative utility yields reliable contexts at scale.