Transfer Learning for an Endangered Slavic Variety: Dependency Parsing in Pomak Across Contact-Shaped Dialects

arXiv cs.CL / 3/31/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces new research resources and baseline dependency-parsing experiments for Pomak, an endangered Eastern South Slavic language with strong dialect variation and limited standardization.
  • It tests cross-dialect transfer by training a parser on the Pomak Universal Dependencies treebank primarily derived from the Greece variety and evaluating zero-shot performance on the Turkey (Uzunköprü) variety.
  • The study quantifies how phonological and morphosyntactic differences between dialects affect parsing accuracy under zero-shot transfer.
  • A new manually annotated Turkish-variety Pomak corpus of 650 sentences is released/used, and the authors show that targeted fine-tuning yields substantial accuracy gains even with the small dataset.
  • Combining cross-variety transfer learning from both dialects further improves performance beyond fine-tuning alone.

Abstract

This paper presents new resources and baselines for Dependency Parsing in Pomak, an endangered Eastern South Slavic language with substantial dialectal variation and no widely adopted standard. We focus on the variety spoken in Turkey (Uzunk\"opr\"u) and ask how well a dependency parser trained on the existing Pomak Universal Dependencies treebank, which was built primarily from the variety that is spoken in Greece, transfers across dialects. We run two experimental phases. First, we train a parser on the Greek-variety UD data and evaluate zero-shot transfer to Turkish-variety Pomak, quantifying the impact of phonological and morphosyntactic differences. Second, we introduce a new manually annotated Turkish-variety Pomak corpus of 650 sentences and show that, despite its small size, targeted fine-tuning substantially improves accuracy; performance is further boosted by cross-variety transfer learning that combines the two dialects.