HMS-BERT: Hybrid Multi-Task Self-Training for Multilingual and Multi-Label Cyberbullying Detection
arXiv cs.CL / 3/16/2026
📰 NewsModels & Research
Key Points
- HMS-BERT introduces a hybrid multi-task self-training framework built on a multilingual BERT backbone for multilingual and multi-label cyberbullying detection.
- The model combines contextual representations with handcrafted linguistic features and jointly optimizes a fine-grained multi-label abuse classification task and a three-class main classification task.
- An iterative self-training strategy with confidence-based pseudo-labeling addresses labeled data scarcity in low-resource languages to facilitate cross-lingual knowledge transfer.
- Experiments on four public datasets show strong performance, with macro F1-scores up to 0.9847 on the multi-label task and an accuracy of 0.6775 on the main classification task, with ablation studies confirming component effectiveness.
- The work targets multilingual and multi-label cyberbullying detection, addressing data scarcity and language diversity in realistic social media moderation scenarios.




![[Boost]](/_next/image?url=https%3A%2F%2Fmedia2.dev.to%2Fdynamic%2Fimage%2Fwidth%3D800%252Cheight%3D%252Cfit%3Dscale-down%252Cgravity%3Dauto%252Cformat%3Dauto%2Fhttps%253A%252F%252Fdev-to-uploads.s3.amazonaws.com%252Fuploads%252Fuser%252Fprofile_image%252F3833034%252F44fa15e0-8eb9-4843-a424-a4a7b3538f43.jpeg&w=3840&q=75)