ResRank: Unifying Retrieval and Listwise Reranking via End-to-End Joint Training with Residual Passage Compression

arXiv cs.AI / 4/27/2026

💬 OpinionDeveloper Stack & InfrastructureModels & Research

Key Points

  • ResRank is an end-to-end unified retrieval + listwise reranking framework designed to overcome LLM reranking bottlenecks like the “lost in the middle” effect and super-linear inference latency from long passages.
  • It uses an Encoder-LLM to compress each candidate passage into a single embedding, which is then combined with the query and passed to a Reranker-LLM for listwise ranking.
  • A residual connection design is added to reduce misalignment between the compressed embedding space and the ranking space by merging encoder embeddings with reranker contextual hidden states.
  • ResRank avoids autoregressive generation by using a one-step cosine-similarity-based scoring mechanism, requiring zero generated tokens and only one token per passage, while being trained with a dual-stage, multi-task joint optimization strategy.
  • Experiments on TREC Deep Learning and eight BEIR datasets show ResRank is competitive or better than existing methods while improving the effectiveness/efficiency trade-off substantially.

Abstract

Large language model (LLM) based listwise reranking has emerged as the dominant paradigm for achieving state-of-the-art ranking effectiveness in information retrieval. However, its reliance on feeding full passage texts into the LLM introduces two critical bottlenecks: the "lost in the middle" phenomenon degrades ranking quality as input length grows, and the inference latency scales super-linearly with sequence length, rendering it impractical for industrial deployment. In this paper, we present ResRank, a unified retrieval-reranking framework that fundamentally addresses both challenges. Inspired by multimodal LLMs that project visual inputs into compact token representations, ResRank employs an Encoder-LLM to compress each candidate passage into a single embedding, which is then fed alongside the query text into a Reranker-LLM for listwise ranking. To alleviate the misalignment between the compressed representation space and the ranking space, we introduce a residual connection structure that combines encoder embeddings with contextualized hidden states from the reranker. Furthermore, we replace the conventional autoregressive decoding with a one-step cosine-similarity-based scoring mechanism, eliminating the generation bottleneck entirely. ResRank is trained through a carefully designed dual-stage, multi-task, end-to-end joint optimization strategy that simultaneously trains the encoder and reranker, achieving learning objective alignment between retrieval and reranking while substantially reducing training complexity. Extensive experiments on TREC Deep Learning and eight BEIR benchmark datasets demonstrate that ResRank achieves competitive or superior ranking effectiveness compared to existing approaches while requiring zero generated tokens and processing only one token per passage, yielding a fundamentally better balance between effectiveness and efficiency.