Dependency Parsing Across the Resource Spectrum: Evaluating Architectures on High and Low-Resource Languages

arXiv cs.CL / 5/5/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The study evaluates four dependency parsers—Biaffine LSTM, Stack-Pointer Network, AfroXLMR-large, and RemBERT—across ten typologically diverse languages, emphasizing low-resource African languages.
  • Results show that Biaffine LSTM consistently performs better than transformer-based models when training data is scarce, while transformers regain the lead as more data becomes available.
  • The performance “crossover” point occurs in a data-resource range that is typical for treebanks of under-resourced languages.
  • Morphological complexity (using MATTR) is identified as an additional predictor of how much transformer models underperform relative to simpler architectures after accounting for corpus size.
  • The findings suggest Biaffine LSTM may be a better default choice for syntactic tool development in low-resource settings until enough annotated data exists to exploit transformer strengths.

Abstract

Transformer-based models achieve state-of-the-art dependency parsing for high-resource languages, yet their advantage over simpler architectures in low-resource settings remains poorly understood. We evaluate four parsers -- the Biaffine LSTM, Stack-Pointer Network, AfroXLMR-large, and RemBERT -- across ten typologically diverse languages, with a focus on low-resource African languages. We find that the Biaffine LSTM consistently outperforms transformer models in low-resource regimes, with transformers recovering their advantage as training data increases. The crossover falls within a resource range typical of treebanks for under-resourced languages. Morphological complexity (measured via MATTR) emerges as a significant secondary predictor of transformers' relative disadvantage after controlling for corpus size. These results indicate that the Biaffine LSTM may be better suited for syntactic tool development in low-resource regimes until sufficient annotated data is available to leverage the representational capacity of pre-trained transformers.