Left Behind: Cross-Lingual Transfer as a Bridge for Low-Resource Languages in Large Language Models

arXiv cs.CL / 3/24/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper benchmarks eight large language models on English, Kazakh, and Mongolian using 50 hand-crafted questions and evaluates 2,000 responses across accuracy, fluency, and completeness.
  • Results show a consistent 13.8–16.7 percentage point performance gap versus English, where models often preserve surface-level fluency but generate substantially less accurate outputs in low-resource languages.
  • Cross-lingual transfer prompting (reason in English, then translate back) yields selective improvements for bilingual architectures (+2.2pp to +4.3pp) but offers no benefit for English-dominant models.
  • The study concludes that current LLMs systematically under-serve low-resource language communities and that mitigation effectiveness depends on model architecture rather than a single universal prompting strategy.

Abstract

We investigate how large language models perform on low-resource languages by benchmarking eight LLMs across five experimental conditions in English, Kazakh, and Mongolian. Using 50 hand-crafted questions spanning factual, reasoning, technical, and culturally grounded categories, we evaluate 2,000 responses on accuracy, fluency, and completeness. We find a consistent performance gap of 13.8-16.7 percentage points between English and low-resource language conditions, with models maintaining surface-level fluency while producing significantly less accurate content. Cross-lingual transfer-prompting models to reason in English before translating back-yields selective gains for bilingual architectures (+2.2pp to +4.3pp) but provides no benefit to English-dominant models. Our results demonstrate that current LLMs systematically underserve low-resource language communities, and that effective mitigation strategies are architecture-dependent rather than universal.