Hybrid Retrieval for COVID-19 Literature: Comparing Rank Fusion and Projection Fusion with Diversity Reranking

arXiv cs.CL / 4/16/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Key Points

  • The paper introduces a hybrid COVID-19 literature retrieval system on the TREC-COVID benchmark, combining sparse models (SPLADE), dense models (BGE), and fusion strategies (RRF and projection-based B5) to improve relevance and diversity.
  • Rank-level fusion (RRF) delivers the best overall retrieval quality with nDCG@10 of 0.828, outperforming dense-only and sparse-only baselines, while projection fusion (B5) trades off some relevance for stronger latency and diversity metrics.
  • The B5 projection fusion variant achieves nDCG@10 of 0.678 but is 33% faster (847 ms vs. 1271 ms) and yields 2.2x higher ILD@10 than RRF, with the largest relative gains on keyword-heavy reformulations.
  • Applying diversity-oriented reranking via MMR increases intra-list diversity by about 24% (ILD improvements) but reduces effectiveness by roughly 20–25% in nDCG@10, quantifying the relevance–diversity tradeoff.
  • The system is implemented as a deployed Streamlit web application using Pinecone serverless indices and keeps end-to-end latency under a sub-2-second target across multiple query types.

Abstract

We present a hybrid retrieval system for COVID-19 scientific literature, evaluated on the TREC-COVID benchmark (171,332 papers, 50 expert queries). The system implements six retrieval configurations spanning sparse (SPLADE), dense (BGE), rank-level fusion (RRF), and a projection-based vector fusion (B5) approach. RRF fusion achieves the best relevance (nDCG@10 = 0.828), outperforming dense-only by 6.1% and sparse-only by 14.9%. Our projection fusion variant reaches nDCG@10 = 0.678 on expert queries while being 33% faster (847 ms vs. 1271 ms) and producing 2.2x higher ILD@10 than RRF. Evaluation across 400 queries -- including expert, machine-generated, and three paraphrase styles -- shows that B5 delivers the largest relative gain on keyword-heavy reformulations (+8.8%), although RRF remains best in absolute nDCG@10. On expert queries, MMR reranking increases intra-list diversity by 23.8-24.5% at a 20.4-25.4% nDCG@10 cost. Both fusion pipelines evaluated for latency remain below the sub-2 s target across all query sets. The system is deployed as a Streamlit web application backed by Pinecone serverless indices.