Language, Place, and Social Media: Geographic Dialect Alignment in New Zealand

arXiv cs.CL / 4/20/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The thesis examines how New Zealand-related Reddit communities align geographic dialects with users’ perceived place identity and language variation.
  • It combines qualitative insights about user perceptions with computational analyses of lexical, morphosyntactic, and semantic variables to study language change.
  • Results suggest that people generally link language to place and that place-based communities tend to form a contiguous speech community, though dialect-to-place alignment can be complex.
  • Advanced language modeling (including static and diachronic Word2Vec embeddings) identifies semantic differences across communities and captures meaningful semantic shifts in New Zealand English.
  • The work produces a very large corpus (4.26B unprocessed words), intended as a resource for future sociolinguistic research, using social media as a “natural laboratory.”

Abstract

This thesis investigates geographic dialect alignment in place-informed social media communities, focussing on New Zealand-related Reddit communities. By integrating qualitative analyses of user perceptions with computational methods, the study examines how language use reflects place identity and patterns of language variation and change based on user-informed lexical, morphosyntactic, and semantic variables. The findings show that users generally associate language with place, and place-related communities form a contiguous speech community, though alignment between geographic dialect communities and place-related communities remains complex. Advanced language modelling, including static and diachronic Word2Vec language embeddings, revealed semantic variation across place-based communities and meaningful semantic shifts within New Zealand English. The research involved the creation of a corpus containing 4.26 billion unprocessed words, which offers a valuable resource for future study. Overall, the results highlight the potential of social media as a natural laboratory for sociolinguistic inquiry.