LombardoGraphia: Automatic Classification of Lombard Orthography Variants
arXiv cs.CL / 3/31/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper addresses the lack of a unified orthographic standard in Lombard, noting that multiple orthography variants complicate NLP data creation and model training.
- It introduces LombardoGraphia, a curated corpus of 11,186 Lombard Wikipedia samples tagged with 9 orthographic variants, designed specifically for orthographic analysis.
- The authors propose and evaluate both traditional and neural classification approaches, training 24 models using different features and encoding levels.
- The best-performing models reach 96.06% overall accuracy and 85.78% average class accuracy, but minority-class performance is limited by data imbalance.
- The work aims to provide foundational infrastructure for variety-aware NLP resource development for underresourced languages like Lombard.
Related Articles
[D] How does distributed proof of work computing handle the coordination needs of neural network training?
Reddit r/MachineLearning

BYOK is not just a pricing model: why it changes AI product trust
Dev.to

AI Citation Registries and Identity Persistence Across Records
Dev.to

Building Real-Time AI Voice Agents with Google Gemini 3.1 Flash Live and VideoSDK
Dev.to

Your Knowledge, Your Model: A Method for Deterministic Knowledge Externalization
Dev.to