Effects of Cross-lingual Evidence in Multilingual Medical Question Answering

arXiv cs.CL / 4/23/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies multilingual medical question answering in both high-resource languages (English, Spanish, French, Italian) and low-resource languages (Basque, Kazakh), analyzing how different forms of external evidence affect performance.
  • It compares three external evidence types—curated medical knowledge repositories, web-retrieved content, and LLM parametric explanations—across models of different sizes.
  • Results show that larger models consistently perform better in English, but the best external evidence strategy varies by language resource level.
  • For high-resource languages, English web-retrieved data is the most beneficial, while for low-resource languages the best approach is cross-lingual retrieval using both English and the target language.
  • The study argues that external knowledge does not universally improve outcomes and highlights limitations of specialized sources like PubMed due to insufficient multilingual coverage.

Abstract

This paper investigates Multilingual Medical Question Answering across high-resource (English, Spanish, French, Italian) and low-resource (Basque, Kazakh) languages. We evaluate three types of external evidence sources across models of varying size: curated repositories of specialized medical knowledge, web-retrieved content, and explanations from LLM's parametric knowledge. Moreover, we conduct experiments with multilingual, monolingual and cross-lingual retrieval. Our results demonstrate that larger models consistently achieve superior performance in English across baseline evaluations. When incorporating external knowledge, web-retrieved data in English proves most beneficial for high-resource languages. Conversely, for low-resource languages, the most effective strategy combines retrieval in both English and the target language, achieving comparable accuracy to high-resource language results. These findings challenge the assumption that external knowledge systematically improves performance and reveal that effective strategies depend on both the source of language resources and on model scale. Furthermore, specialized medical knowledge sources such as PubMed are limited: while they provide authoritative expert knowledge, they lack adequate multilingual coverage