Leveraging Wikidata for Geographically Informed Sociocultural Bias Dataset Creation: Application to Latin America
arXiv cs.AI / 3/12/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper proposes leveraging Wikipedia content, the Wikidata knowledge graph, and social science expertise to create a dataset of culturally informed Q/A pairs for Latin American contexts.
- They construct LatamQA with over 26,000 questions and answers drawn from 26,000 Wikipedia articles, transformed into multiple-choice items in Spanish and Portuguese and translated into English.
- They use LatamQA to benchmark several LLMs, finding disparities across LatAm countries, better performance in models' original language, and greater familiarity with Iberian Spanish than Latin American variants.
- The work highlights data gaps in non-English LatAm contexts and provides a resource to measure and mitigate sociocultural bias in LLMs.
Related Articles
The massive shift toward edge computing and local processing
Dev.to
Self-Refining Agents in Spec-Driven Development
Dev.to
Week 3: Why I'm Learning 'Boring' ML Before Building with LLMs
Dev.to
The Three-Agent Protocol Is Transferable. The Discipline Isn't.
Dev.to

has anyone tried this? Flash-MoE: Running a 397B Parameter Model on a Laptop
Reddit r/LocalLLaMA