ViGoEmotions: A Benchmark Dataset For Fine-grained Emotion Detection on Vietnamese Texts
arXiv cs.CL / 3/27/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces ViGoEmotions, a Vietnamese social media dataset with 20,664 comments annotated with 27 fine-grained emotion labels for emotion detection research.
- Eight pre-trained Transformer-based models are benchmarked using three emoji handling/preprocessing strategies: preserving original emojis, converting emojis to text, and applying ViSoLex lexical normalization.
- Experimental results indicate that converting emojis into textual descriptions improves several BERT-based baselines, while preserving emojis tends to work best for ViSoBERT and CafeBERT.
- Removing emojis generally reduces model performance, underscoring the importance of emoji information for fine-grained emotion classification.
- ViSoBERT achieves the top results with a Macro F1 of 61.50% and Weighted F1 of 63.26%, demonstrating the dataset’s utility for multiple architectures while emphasizing preprocessing and annotation quality as key determinants.
Related Articles
I Extended the Trending mcp-brasil Project with AI Generation — Full Tutorial
Dev.to
The Rise of Self-Evolving AI: From Stanford Theory to Google AlphaEvolve and Berkeley OpenSage
Dev.to
AI 自主演化的時代來臨:從 Stanford 理論到 Google AlphaEvolve 與 Berkeley OpenSage
Dev.to
Neural Networks in Mobile Robot Motion
Dev.to
Retraining vs Fine-tuning or Transfer Learning? [D]
Reddit r/MachineLearning