Neural Grammatical Error Correction for Romanian
arXiv cs.CL / 4/28/2026
📰 NewsDeveloper Stack & InfrastructureModels & Research
Key Points
- The paper introduces a first Grammatical Error Correction (GEC) corpus for Romanian, containing 10k sentence pairs, addressing the scarcity of non-English GEC resources.
- It adapts the German ERRANT (ERRor ANnotation Toolkit) scorer for Romanian to support edit extraction and proper evaluation of the corpus.
- Experiments with multiple neural models and pretraining strategies show strong gains for low-resource GEC, outperforming a baseline small Transformer trained only on the Romanian dataset.
- The best results come from pretraining a larger Transformer on artificially generated data and then fine-tuning on the real corpus, reaching an F0.5 of 53.76 versus 44.38 for the baseline.
- The authors propose an artificial data generation method that is designed to be extensible to other languages using only a POS tagger.
Related Articles
How to Build Traceable and Evaluated LLM Workflows Using Promptflow, Prompty, and OpenAI
MarkTechPost

An improvement of the convergence proof of the ADAM-Optimizer
Dev.to
Claude Code 会话历史在哪里?如何找回你的 AI 编程对话记录
Dev.to
We built an AI that runs an entire business autonomously. Not a demo. Not a prototype. Actually running. YC-backed, here's what we learned.
Reddit r/artificial
langchain-tests==1.1.7
LangChain Releases