LuxBorrow: From Pompier to Pompjee, Tracing Borrowing in Luxembourgish
arXiv cs.CL / 3/12/2026
📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research
Key Points
- LuxBorrow introduces a borrowing-first analysis of Luxembourgish (LU) news from 1999 to 2025, using a pipeline that combines sentence-level language identification (LU/DE/FR/EN) with a token-level borrowing resolver, lemmatization, a loanword registry, and morphological/orthographic rules.
- The study shows Luxembourgish remains the matrix language across all documents, but multilingual practice is pervasive, with 77.1% of articles containing at least one donor language and 65.4% drawing on three or four donors.
- Token-level adaptations total 25,444 instances and are mostly morphological (63.8%) and orthographic (35.9%), with a small lexical component (0.3%), and the most frequent rules are orthographic changes such as on->oun and eur->er.
- The authors advocate borrowing-centric evaluation metrics—such as borrowed token/type rates, donor entropy over borrowed items, and assimilation ratios—over relying solely on document-level mixing indices.