Aligning Dense Retrievers with LLM Utility via DistillationAligning Dense Retrievers with LLM Utility via Distillation

arXiv cs.AI / 4/27/2026

💬 OpinionDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

The paper proposes UAE (Utility-Aligned Embeddings) to improve dense vector retrieval for RAG by aligning embedding similarity with an LLM’s retrieval utility signals.
UAE trains a bi-encoder using a distribution-matching formulation and a Utility-Modulated InfoNCE objective that derives graded utility from perplexity reduction, avoiding any test-time LLM re-ranking.
By injecting utility signals directly into the embedding space, UAE aims to overcome precision limitations of pure similarity search while sidestepping the computational cost and noise of LLM-based re-ranking.
Experiments on the QASPER benchmark show large gains over a strong semantic baseline (BGE-Base), including +30.59% Recall@1, +30.16% MAP, and +17.3% Token F1.
UAE is reported to be over 180× faster than efficient LLM re-ranking methods while maintaining competitive quality, enabling scalable retrieval for RAG systems.

Abstract

Dense vector retrieval is the practical backbone of Retrieval- Augmented Generation (RAG), but similarity search can suffer from precision limitations. Conversely, utility-based approaches leveraging LLM re-ranking often achieve superior performance but are computationally prohibitive and prone to noise inherent in perplexity estimation. We propose Utility-Aligned Embeddings (UAE), a framework designed to merge these advantages into a practical, high-performance retrieval method. We formulate retrieval as a distribution matching problem, training a bi-encoder to imitate a utility distribution derived from perplexity reduction using a Utility-Modulated InfoNCE objective. This approach injects graded utility signals directly into the embedding space without requiring test-time LLM inference. On the QASPER benchmark, UAE improves retrieval Recall@1 by 30.59%, MAP by 30.16% and Token F1 by 17.3% over the strong semantic baseline BGE-Base. Crucially, UAE is over 180x faster than the efficient LLM re-ranking methods preserving competitive performance, demonstrating that aligning retrieval with generative utility yields reliable contexts at scale.

Context Compression in .NET

Dev.to

Subagents: The Building Block of Agentic AI

Dev.to

Why Cursor Keeps Writing MD5 Password Hashes (CWE-328)

Dev.to

GET Serves Cache, POST Runs Inference: Cost Safety for a Public LLM Endpoint

Dev.to

DeepSeek-V4 Models Could Change Global AI Race

AI Business

Aligning Dense Retrievers with LLM Utility via DistillationAligning Dense Retrievers with LLM Utility via Distillation

Key Points

Abstract

Related Articles

Context Compression in .NET

Subagents: The Building Block of Agentic AI

Why Cursor Keeps Writing MD5 Password Hashes (CWE-328)

GET Serves Cache, POST Runs Inference: Cost Safety for a Public LLM Endpoint

DeepSeek-V4 Models Could Change Global AI Race

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer