Not All Pretraining are Created Equal: Threshold Tuning and Class Weighting for Imbalanced Polarization Tasks in Low-Resource Settings
arXiv cs.LG / 3/26/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper presents Transformer-based solutions for the SemEval-2025 Polarization Shared Task, covering binary polarization detection plus two multi-label classification subtasks in both English and Swahili.
- It improves performance under severe class imbalance in low-resource settings using class-weighted loss, iterative stratified data splitting, and per-label threshold tuning.
- The approach combines multilingual and African-language-specialized models (mDeBERTa-v3-base, SwahBERT, AfriBERTa-large), with the best validation result reported for mDeBERTa-v3-base.
- Reported results reach 0.8032 macro-F1 on validation for binary detection and up to 0.556 macro-F1 on multi-label tasks, indicating competitive effectiveness but room for gains.
- Error analysis highlights ongoing difficulties with implicit polarization, code-switching, and separating heated political rhetoric from true polarization signals.
Related Articles
AgentDesk vs Hiring Another Consultant: A Cost Comparison
Dev.to
"Why Your AI Agent Needs a System 1"
Dev.to
When should we expect TurboQuant?
Reddit r/LocalLLaMA
AI as Your Customs Co-Pilot: Automating HS Code Chaos in Southeast Asia
Dev.to
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
Dev.to