Leveraging Wikidata for Geographically Informed Sociocultural Bias Dataset Creation: Application to Latin America
arXiv cs.AI / 3/12/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper proposes leveraging Wikipedia content, the Wikidata knowledge graph, and social science expertise to create a dataset of culturally informed Q/A pairs for Latin American contexts.
- They construct LatamQA with over 26,000 questions and answers drawn from 26,000 Wikipedia articles, transformed into multiple-choice items in Spanish and Portuguese and translated into English.
- They use LatamQA to benchmark several LLMs, finding disparities across LatAm countries, better performance in models' original language, and greater familiarity with Iberian Spanish than Latin American variants.
- The work highlights data gaps in non-English LatAm contexts and provides a resource to measure and mitigate sociocultural bias in LLMs.
Related Articles
ADICはどの種類の革新なのか ―― ドリフト監査デモで見る「事後説明」から「通過条件」への移行**
Qiita
Complete Guide: How To Make Money With Ai
Dev.to
Built a small free iOS app to reduce LLM answer uncertainty with multiple models
Dev.to
Without Valid Data, AI Transformation Is Flying Blind – Why We Need to “Grasp” Work Again
Dev.to
How We Used Hindsight Memory to Build an AI That Knows Your Weaknesses
Dev.to