All Languages Matter: Understanding and Mitigating Language Bias in Multilingual RAG

arXiv cs.CL / 4/23/2026

📰 NewsModels & Research

Key Points

  • The paper finds that multilingual RAG (mRAG) systems exhibit language bias in the reranking stage, tending to prioritize English and the query’s native language over other languages.
  • Using an estimated “oracle evidence” analysis, the authors quantify a sizable performance gap between existing rerankers and the theoretical upper bound of what reranking could achieve.
  • They identify a key distribution mismatch: optimal answers require evidence dispersed across multiple languages, but current systems suppress these “answer-critical” documents.
  • To address this, the authors propose LAURA (Language-Agnostic Utility-driven Reranker Alignment), which aligns multilingual evidence ranking with downstream generative usefulness.
  • Experiments across multiple languages and generation models show that LAURA reduces language bias and yields consistent mRAG performance improvements.

Abstract

Multilingual Retrieval-Augmented Generation (mRAG) leverages cross-lingual evidence to ground Large Language Models (LLMs) in global knowledge. However, we show that current mRAG systems suffer from a language bias during reranking, systematically favoring English and the query's native language. By introducing an estimated oracle evidence analysis, we quantify a substantial performance gap between existing rerankers and the achievable upper bound. Further analysis reveals a critical distributional mismatch: while optimal predictions require evidence scattered across multiple languages, current systems systematically suppress such ``answer-critical'' documents, thereby limiting downstream generation performance. To bridge this gap, we propose \textit{\textbf{L}anguage-\textbf{A}gnostic \textbf{U}tility-driven \textbf{R}eranker \textbf{A}lignment (LAURA)}, which aligns multilingual evidence ranking with downstream generative utility. Experiments across diverse languages and generation models show that LAURA effectively mitigates language bias and consistently improves mRAG performance.