Neural Grammatical Error Correction for Romanian

arXiv cs.CL / 4/28/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

The paper introduces a first Grammatical Error Correction (GEC) corpus for Romanian, containing 10k sentence pairs, addressing the scarcity of non-English GEC resources.
It adapts the German ERRANT (ERRor ANnotation Toolkit) scorer for Romanian to support edit extraction and proper evaluation of the corpus.
Experiments with multiple neural models and pretraining strategies show strong gains for low-resource GEC, outperforming a baseline small Transformer trained only on the Romanian dataset.
The best results come from pretraining a larger Transformer on artificially generated data and then fine-tuning on the real corpus, reaching an F0.5 of 53.76 versus 44.38 for the baseline.
The authors propose an artificial data generation method that is designed to be extensible to other languages using only a POS tagger.

Abstract

Resources for Grammatical Error Correction (GEC) in non-English languages are scarce, while available spellcheckers in these languages are mostly limited to simple corrections and rules. In this paper we introduce a first GEC corpus for Romanian consisting of 10k pairs of sentences. In addition, the German version of ERRANT (ERRor ANnotation Toolkit) scorer was adapted for Romanian to analyze this corpus and extract edits needed for evaluation. Multiple neural models were experimented, together with pretraining strategies, which proved effective for GEC in low-resource settings. Our baseline consists of a small Transformer model trained only on the GEC dataset (F0.5 of 44.38), whereas the best performing model is produced by pretraining a larger Transformer model on artificially generated data, followed by finetuning on the actual corpus (F0.5 of 53.76). The proposed method for generating additional training examples is easily extensible and can be applied to any language, as it requires only a POS tagger

How to Build Traceable and Evaluated LLM Workflows Using Promptflow, Prompty, and OpenAI

MarkTechPost

An improvement of the convergence proof of the ADAM-Optimizer

Dev.to

Claude Code 会话历史在哪里？如何找回你的 AI 编程对话记录

Dev.to

We built an AI that runs an entire business autonomously. Not a demo. Not a prototype. Actually running. YC-backed, here's what we learned.

Reddit r/artificial

langchain-tests==1.1.7

LangChain Releases

Neural Grammatical Error Correction for Romanian

Key Points

Abstract

Related Articles

How to Build Traceable and Evaluated LLM Workflows Using Promptflow, Prompty, and OpenAI

An improvement of the convergence proof of the ADAM-Optimizer

Claude Code 会话历史在哪里？如何找回你的 AI 编程对话记录

We built an AI that runs an entire business autonomously. Not a demo. Not a prototype. Actually running. YC-backed, here's what we learned.

langchain-tests==1.1.7

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer