All Languages Matter: Understanding and Mitigating Language Bias in Multilingual RAG

arXiv cs.CL / 4/23/2026

📰 NewsModels & Research

共有:

Key Points

The paper finds that multilingual RAG (mRAG) systems exhibit language bias in the reranking stage, tending to prioritize English and the query’s native language over other languages.
Using an estimated “oracle evidence” analysis, the authors quantify a sizable performance gap between existing rerankers and the theoretical upper bound of what reranking could achieve.
They identify a key distribution mismatch: optimal answers require evidence dispersed across multiple languages, but current systems suppress these “answer-critical” documents.
To address this, the authors propose LAURA (Language-Agnostic Utility-driven Reranker Alignment), which aligns multilingual evidence ranking with downstream generative usefulness.
Experiments across multiple languages and generation models show that LAURA reduces language bias and yields consistent mRAG performance improvements.

Abstract

Multilingual Retrieval-Augmented Generation (mRAG) leverages cross-lingual evidence to ground Large Language Models (LLMs) in global knowledge. However, we show that current mRAG systems suffer from a language bias during reranking, systematically favoring English and the query's native language. By introducing an estimated oracle evidence analysis, we quantify a substantial performance gap between existing rerankers and the achievable upper bound. Further analysis reveals a critical distributional mismatch: while optimal predictions require evidence scattered across multiple languages, current systems systematically suppress such ``answer-critical'' documents, thereby limiting downstream generation performance. To bridge this gap, we propose \textit{\textbf{L}anguage-\textbf{A}gnostic \textbf{U}tility-driven \textbf{R}eranker \textbf{A}lignment (LAURA)}, which aligns multilingual evidence ranking with downstream generative utility. Experiments across diverse languages and generation models show that LAURA effectively mitigates language bias and consistently improves mRAG performance.