CroSearch-R1: Better Leveraging Cross-lingual Knowledge for Retrieval-Augmented Generation

arXiv cs.CL / 4/29/2026

📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • The paper argues that multilingual corpora can improve Retrieval-Augmented Generation (RAG) by correcting and supplementing facts, but naive cross-language context concatenation may hurt effectiveness.
  • It proposes CroSearch-R1, a search-augmented reinforcement learning framework that integrates multilingual knowledge into the GRPO process rather than simply appending knowledge snippets.
  • CroSearch-R1 uses a multi-turn retrieval strategy with cross-lingual knowledge integration to align evidence from different languages into a unified representation space.
  • It also introduces a multilingual rollout mechanism aimed at improving reasoning transferability across languages, and reports experimental gains on multilingual RAG effectiveness.

Abstract

A multilingual collection may contain useful knowledge in other languages to supplement and correct the facts in the original language for Retrieval-Augmented Generation (RAG). However, the vanilla approach that simply concatenates multiple pieces of knowledge from different languages into the context may fail to improve effectiveness due to the potential disparities across languages. To better leverage multilingual knowledge, we propose CroSearch-R1, a search-augmented reinforcement learning framework to integrate multilingual knowledge into the Group Relative Policy Optimization (GRPO) process. In particular, the approach adopts a multi-turn retrieval strategy with cross-lingual knowledge integration to dynamically align the knowledge from other languages as supplementary evidence into a unified representation space. Furthermore, we introduce a multilingual rollout mechanism to optimize reasoning transferability across languages. Experimental results demonstrate that our framework effectively leverages cross-lingual complementarity and improves the effectiveness of RAG with multilingual collections.