Benchmarking PyCaret AutoML Against BiLSTM for Fine-Grained Emotion Classification: A Comparative Study on 20-Class Emotion Detection

arXiv cs.CL / 4/30/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper benchmarks classical ML and deep learning models for fine-grained emotion classification across 20 emotion classes using the 20-Emotion Text Classification Dataset (79,595 English sentences).
  • For machine learning, Logistic Regression, Multinomial Naive Bayes, and SVM are evaluated with TF-IDF features.
  • For deep learning, BiLSTM, GRU, and a lightweight Transformer implemented in PyTorch are compared.
  • BiLSTM delivers the strongest overall results at 89% accuracy and a weighted F1-score of 0.89, narrowly beating the best ML baseline (SVM at 88.11% accuracy).
  • The study concludes that sequence-based deep learning models can capture contextual emotional signals more effectively, while traditional ML remains competitive and computationally efficient.

Abstract

Fine-grained emotion classification, which identifies specific emotional states such as happiness, anger, sadness, and fear, remains a challenging task in natural language processing. This study benchmarks classical machine learning and deep learning approaches for 20-class emotion classification using the 20-Emotion Text Classification Dataset containing 79,595 English sentences. On the machine learning side, Logistic Regression, Multinomial Naive Bayes, and Support Vector Machine are evaluated using TF-IDF features. On the deep learning side, Bidirectional Long Short-Term Memory, Gated Recurrent Unit, and a lightweight Transformer implemented in PyTorch are compared. The results show that BiLSTM achieves the best overall performance with 89% accuracy and a weighted F1-score of 0.89, slightly outperforming the best machine learning model, SVM, which reaches 88.11% accuracy. The findings indicate that while traditional machine learning models remain competitive and computationally efficient, sequence-based deep learning models better capture contextual emotional cues in text.