RUMLEM: A Dictionary-Based Lemmatizer for Romansh
arXiv cs.CL / 4/14/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces RUMLEM, a dictionary-based lemmatizer designed specifically for the Romansh language and its five main regional varieties plus Rumantsch Grischun.
- By relying on comprehensive, community-driven morphological databases, RUMLEM achieves coverage of roughly 77–84% of words in typical Romansh text.
- The approach is variety-aware: separate databases per variety enable the lemmatizer to support variety-aware language classification.
- Experiments on 30,000 Romansh texts show RUMLEM identifies the correct variety in 95% of cases.
- A proof of concept further demonstrates that lemmatization outputs can support Romansh-vs-non-Romansh language classification.
Related Articles

Black Hat Asia
AI Business
Microsoft launches MAI-Image-2-Efficient, a cheaper and faster AI image model
VentureBeat

The AI School Bus Camera Company Blanketing America in Tickets
Dev.to
GPT-5.3 and GPT-5.4 on OpenClaw: Setup and Configuration...
Dev.to
GLM-5 on OpenClaw: Setup Guide, Benchmarks, and When to...
Dev.to