Dependency Parsing Across the Resource Spectrum: Evaluating Architectures on High and Low-Resource Languages

arXiv cs.CL / 5/5/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The study evaluates four dependency parsers—Biaffine LSTM, Stack-Pointer Network, AfroXLMR-large, and RemBERT—across ten typologically diverse languages, emphasizing low-resource African languages.
Results show that Biaffine LSTM consistently performs better than transformer-based models when training data is scarce, while transformers regain the lead as more data becomes available.
The performance “crossover” point occurs in a data-resource range that is typical for treebanks of under-resourced languages.
Morphological complexity (using MATTR) is identified as an additional predictor of how much transformer models underperform relative to simpler architectures after accounting for corpus size.
The findings suggest Biaffine LSTM may be a better default choice for syntactic tool development in low-resource settings until enough annotated data exists to exploit transformer strengths.

Abstract

Transformer-based models achieve state-of-the-art dependency parsing for high-resource languages, yet their advantage over simpler architectures in low-resource settings remains poorly understood. We evaluate four parsers -- the Biaffine LSTM, Stack-Pointer Network, AfroXLMR-large, and RemBERT -- across ten typologically diverse languages, with a focus on low-resource African languages. We find that the Biaffine LSTM consistently outperforms transformer models in low-resource regimes, with transformers recovering their advantage as training data increases. The crossover falls within a resource range typical of treebanks for under-resourced languages. Morphological complexity (measured via MATTR) emerges as a significant secondary predictor of transformers' relative disadvantage after controlling for corpus size. These results indicate that the Biaffine LSTM may be better suited for syntactic tool development in low-resource regimes until sufficient annotated data is available to leverage the representational capacity of pre-trained transformers.