Language Ideologies in a Multilingual Society: An LLM-based Analysis of Luxembourgish News Comments

arXiv cs.CL / 5/1/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that detecting “language ideologies” in multilingual societies helps explain how identities and social belonging are constructed through discourse.
  • It proposes using large language models (LLMs) to replicate human-coded ideological categories from Luxembourgish news comments, testing multiple prompt conditions.
  • The researchers manually annotate a Luxembourgish user-comment corpus with predefined ideological labels and evaluate how closely LLM outputs match human annotations.
  • Because Luxembourgish is a low-resource language with limited representation in LLM training data, the study also tests whether machine-translating comments into high-resource languages improves ideology-detection performance.
  • Results indicate LLMs are not yet fully optimized for multi-class ideological annotation, but they can still serve as practical tools for identifying ideological content in text.

Abstract

Detecting language ideologies is a valuable yet complex task for understanding how identities are constructed through discourse. In Luxembourg's multicultural and multilingual society, language ideologies reflect more than simple preferences: they carry deep cultural and social meanings, shaping identities and social belonging. Following recent developments in applying Natural Language Processing tools to linguistics and social science, this paper explores the potential of large language models to assist in the detection of language ideologies. We manually annotate a corpus of user comments in Luxembourgish with predefined ideological categories and then evaluate the performance of large language models under varying prompt conditions to assess their ability to replicate these human annotations. Since Luxembourgish is a small language and poorly represented in the LLMs' training data, we also investigate whether machine-translating the data to high-resource languages increases performance on the ideology detection task. Our findings suggest that, while LLMs are not yet fully optimized for a multi-class ideological annotation task, they are practical tools to identify language ideological content.