Benchmarking PyCaret AutoML Against IndoBERT Fine-Tuning for Sentiment Analysis on Indonesian IKN Twitter Data

arXiv cs.CL / 4/29/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The study benchmarks PyCaret AutoML using classical models against IndoBERT fine-tuning for binary sentiment analysis on Indonesian IKN-related Twitter comments.
  • The dataset comprises 1,472 manually labeled samples (780 negative, 692 positive), and classical models were assessed with 10-fold cross-validation.
  • Logistic Regression performed best among the classical baselines, reaching 77.57% accuracy and 77.17% F1-score.
  • IndoBERT fine-tuning (indobenchmark/indobert-base-p1) achieved substantially higher results, with 89.59% test accuracy and 89.37% F1-score after five epochs.
  • The findings indicate that Transformer-based contextual representations are highly effective for sentiment classification of informal Indonesian social media text, outperforming AutoML-style baselines.

Abstract

This paper benchmarks a classical machine learning approach based on PyCaret AutoML against a deep learning approach based on IndoBERT fine-tuning for binary sentiment analysis of Indonesian-language Twitter comments related to Ibu Kota Nusantara (IKN). The dataset contains 1,472 manually labeled samples, consisting of 780 negative and 692 positive comments. In the machine learning setting, Logistic Regression, Naive Bayes, and Support Vector Machine were evaluated using 10-fold cross-validation, with Logistic Regression achieving the best performance among the classical models at 77.57% accuracy and 77.17% F1-score. In the deep learning setting, the indobenchmark/indobert-base-p1 model was fine-tuned for five epochs and achieved 89.59% test accuracy and 89.37% F1-score. The results show that IndoBERT substantially outperforms the machine learning baselines, highlighting the effectiveness of Transformer-based contextual representations for informal Indonesian social media text.