AI Navigate

Robust Language Identification for Romansh Varieties

arXiv cs.CL / 3/18/2026

📰 NewsTools & Practical UsageModels & Research

Key Points

  • The paper introduces a language identification system for Romansh varieties (idioms) and Rumantsch Grischun using an SVM-based approach.
  • It targets the challenging classification among Romansh idioms and a supra-regional variety, Rumantsch Grischun, as part of the problem.
  • The model is evaluated on a newly curated benchmark across two domains and achieves an average in-domain accuracy of 97%.
  • The classifier is publicly available and can enable applications such as idiom-aware spell checking or machine translation.

Abstract

The Romansh language has several regional varieties, called idioms, which sometimes have limited mutual intelligibility. Despite this linguistic diversity, there has been a lack of documented efforts to build a language identification (LID) system that can distinguish between these idioms. Since Romansh LID should also be able to recognize Rumantsch Grischun, a supra-regional variety that combines elements of several idioms, this makes for a novel and interesting classification problem. In this paper, we present a LID system for Romansh idioms based on an SVM approach. We evaluate our model on a newly curated benchmark across two domains and find that it reaches an average in-domain accuracy of 97%, enabling applications such as idiom-aware spell checking or machine translation. Our classifier is publicly available.