Information Asymmetry across Language Varieties: A Case Study on Cantonese-Mandarin and Bavarian-German QA
arXiv cs.CL / 3/17/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The authors construct a novel QA dataset to study information asymmetry between local language editions (Cantonese vs Mandarin; Bavarian vs German) using Wikipedia as the knowledge source.
- Experiments show LLMs fail to answer questions about information present only in local editions, though providing context from lead sections and translation can substantially improve performance.
- The findings demonstrate the value of local Wikipedia editions for both regional and global information and raise questions about inclusivity and cultural coverage of LLMs.
- The work suggests directions to improve LLMs by leveraging localized sources and translations to close knowledge gaps across language varieties.




