MKJ at SemEval-2026 Task 9: A Comparative Study of Generalist, Specialist, and Ensemble Strategies for Multilingual Polarization

arXiv cs.CL / 4/24/2026

📰 NewsModels & Research

Key Points

  • The paper reports a systematic, cross-lingual study of polarization detection for SemEval-2026 Task 9 (Subtask 1) covering 22 languages, comparing generalist, specialist, and ensemble approaches.
  • It finds that a strong multilingual generalist (e.g., XLM-RoBERTa) can work well when tokenization matches the target text, but performance drops on distinct scripts where monolingual language-specific specialists offer substantial improvements.
  • The authors propose a language-adaptive framework that selects between multilingual generalists, language-specific specialists, and hybrid ensembles according to development-set performance rather than committing to a single universal model.
  • Cross-lingual augmentation using NLLB-200 shows mixed outcomes, frequently underperforming native architecture selection and sometimes harming performance on morphologically rich languages.
  • The proposed final system reaches a macro-averaged F1 of 0.796 and an average accuracy of 0.826 across all 22 tracks, with code and test predictions released publicly.

Abstract

We present a systematic study of multilingual polarization detection across 22 languages for SemEval-2026 Task 9 (Subtask 1), contrasting multilingual generalists with language-specific specialists and hybrid ensembles. While a standard generalist like XLM-RoBERTa suffices when its tokenizer aligns with the target text, it may struggle with distinct scripts (e.g., Khmer, Odia) where monolingual specialists yield significant gains. Rather than enforcing a single universal architecture, we adopt a language-adaptive framework that switches between multilingual generalists, language-specific specialists, and hybrid ensembles based on development performance. Additionally, cross-lingual augmentation via NLLB-200 yielded mixed results, often underperforming native architecture selection and degrading morphologically rich tracks. Our final system achieves an overall macro-averaged F1 score of 0.796 and an average accuracy of 0.826 across all 22 tracks. Code and final test predictions are publicly available at: https://github.com/Maziarkiani/SemEval2026-Task9-Subtask1-Polarization.