\textit{Versteasch du mi?} Computational and Socio-Linguistic Perspectives on GenAI, LLMs, and Non-Standard Language
arXiv cs.CL / 3/31/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that current GenAI and LLM design can be unfair to less-spoken languages and may widen the digital language divide rather than reduce it.
- It connects these outcomes to sociolinguistic critique, claiming that models are shaped by earlier processes of language standardisation tied to European nationalist and colonial histories.
- Using South Tyrolean dialects and Kurdish varieties as case studies, the authors examine how linguistic non-standardness interacts with model behavior and standardisation pressures.
- The work discusses both technical approaches to make LLMs handle non-standard language and the policy question of whether such efforts can support “democratic and decolonial” digital and machine learning strategies.
Related Articles
Why AI agent teams are just hoping their agents behave
Dev.to

Harness as Code: Treating AI Workflows Like Infrastructure
Dev.to

How to Make Claude Code Better at One-Shotting Implementations
Towards Data Science

The Crypto AI Agent Stack That Costs $0/Month to Run
Dev.to

Bag of Freebies for Training Object Detection Neural Networks
Dev.to