THIVLVC: Retrieval Augmented Dependency Parsing for Latin

arXiv cs.CL / 4/8/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • THIVLVC is a two-stage retrieval-augmented dependency parsing system for Latin that retrieves structurally similar sentences from the CIRCSE treebank using length and POS n-gram similarity.
  • It then uses an LLM prompted with the retrieved examples and UD annotation guidelines to refine a baseline dependency parse produced by UDPipe.
  • The authors submit two variants—without retrieval and with retrieval (RAG)—to isolate the effect of the retrieval step.
  • On Seneca poetry, THIVLVC improves CLAS by +17 points over the UDPipe baseline, while on Aquinas prose it yields a smaller +1.5 CLAS gain.
  • A double-blind error analysis of 300 divergences suggests that, even when annotators unanimously disagree with the gold in the divergences analyzed, 53.3% of those cases favor THIVLVC, indicating notable annotation inconsistencies within and across treebanks.

Abstract

We describe THIVLVC, a two-stage system for the EvaLatin 2026 Dependency Parsing task. Given a Latin sentence, we retrieve structurally similar entries from the CIRCSE treebank using sentence length and POS n-gram similarity, then prompt a large language model to refine the baseline parse from UDPipe using the retrieved examples and UD annotation guidelines. We submit two configurations: one without retrieval and one with retrieval (RAG). On poetry (Seneca), THIVLVC improves CLAS by +17 points over the UDPipe baseline; on prose (Thomas Aquinas), the gain is +1.5 CLAS. A double-blind error analysis of 300 divergences between our system and the gold standard reveals that, among unanimous annotator decisions, 53.3% favour THIVLVC, showing annotation inconsistencies both within and across treebanks.