Lost in the Tower of Babel: The Adverse Effects of Incidental Multilingualism in LLMs

arXiv cs.CL / 5/5/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that modern multilingual NLP relies on “incidental multilingualism,” where LLMs look multilingual mainly due to uneven web data rather than an explicit competence objective.
  • It claims this approach leads to uneven, brittle, and hard-to-interpret behavior across languages, which can cause serious failures in real-world and agentic settings requiring reasoning and action in multiple linguistic contexts.
  • The authors conduct an empirical study comparing (1) languages models claim to support and (2) which languages they actually respond to under multilingual prompts.
  • They show that even a simple language-change (attack) can reveal hidden assumptions about language and expose these cross-lingual weaknesses.
  • The paper calls for “multilingualism by design,” proposing a research agenda that prioritizes equitable multilingual performance, cultural grounding, and cross-lingual behavioral understanding across the full model pipeline.

Abstract

This paper argues that contemporary multilingual NLP has converged on a fragile and misleading paradigm of incidental multilingualism. Today's LLMs appear multilingual largely because they are trained on massive, uneven web corpora, not because multilingual or multicultural competence has been treated as a core design objective. We contend that this paradigm systematically produces unequal, brittle, and opaque behavior across languages, with severe consequences in real-world and agentic deployments where models must reason, plan, and act across multiple linguistic contexts. We report a focused empirical study of two practical questions: which languages models self-report as supported and which languages they actually respond in across multilingual prompts. We additionally demonstrate how even a simple language-change attack can surface these failures and expose hidden assumptions about language in LLM-based systems. To address this, we call for a shift toward multilingualism by design: a research agenda that treats equitable multilingual performance, cultural grounding, and cross-lingual behavioral understanding as first-class goals in all aspects of the model pipeline.