A Multi-Model Approach to English-Bangla Sentiment Classification of Government Mobile Banking App Reviews

arXiv cs.CL / 4/16/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies English and Bangla sentiment in 5,652 Google Play reviews for four Bangladeshi government mobile banking apps, linking app quality to users’ financial access.
  • Using a hybrid labeling method (star ratings plus an independent XLM-RoBERTa classifier), the authors report moderate agreement between labeling approaches (kappa = 0.459).
  • Traditional machine-learning models outperform transformer baselines in this setting: Random Forest achieves the highest accuracy (0.815) and Linear SVM the highest weighted F1 (0.804), with fine-tuned XLM-RoBERTa slightly lower (0.793).
  • Aspect-level dissatisfaction is driven mainly by transaction speed and interface design, with the eJanata app receiving the worst ratings across apps.
  • The authors argue for data-driven policy actions—improving app quality, trust-centered release management, and “Bangla-first” NLP—citing a sizable 16.1-point accuracy gap between Bangla and English that underscores low-resource language challenges.

Abstract

For millions of users in developing economies who depend on mobile banking as their primary gateway to financial services, app quality directly shapes financial access. The study analyzed 5,652 Google Play reviews in English and Bangla (filtered from 11,414 raw reviews) for four Bangladeshi government banking apps. The authors used a hybrid labeling approach that combined use of the reviewer's star rating for each review along with a separate independent XLM-RoBERTa classifier to produce moderate inter-method agreement (kappa = 0.459). Traditional models outperformed transformer-based ones: Random Forest produced the highest accuracy (0.815), while Linear SVM produced the highest weighted F1 score (0.804); both were higher than the performance of fine-tuned XLM-RoBERTa (0.793). McNemar's test confirmed that all classical models were significantly superior to the off-the-shelf XLM-RoBERTa (p < 0.05), while differences with the fine-tuned variant were not statistically significant. DeBERTa-v3 was applied to analyze the sentiment at the aspect level across the reviews for the four apps; the reviewers expressed their dissatisfaction primarily with the speed of transactions and with the poor design of interfaces; eJanata app received the worst ratings from the reviewers across all apps. Three policy recommendations are made based on these findings - remediation of app quality, trust-centred release management, and Bangla-first NLP adoption - to assist state-owned banks in moving towards improving their digital services through data-driven methods. Notably, a 16.1-percentage-point accuracy gap between Bangla and English text highlights the need for low-resource language model development.