A New Semisupervised Technique for Polarity Analysis using Masked Language Models

arXiv cs.CL / 4/30/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes a new semisupervised polarity analysis method by adapting Latent Semantic Scaling (LSS) to use word2vec as a masked language model.
  • Instead of spatial approaches, the method derives polarity scores as predicted probabilities of seed words appearing in specific contexts.
  • The authors report that these probabilistic polarity scores are more accurate, interpretable, and consistent than polarity scores produced by prior spatial polarity models.
  • They validate the approach using China Daily coverage related to health achievements during the COVID-19 pandemic, comparing probabilistic and spatial models.
  • The study suggests that using more advanced masked language models could further improve the technique’s effectiveness for polarity analysis.

Abstract

I developed a new version of Latent Semantic Scaling (LSS) employing word2vec as a masked language model. Unlike original spatial models, it assigns polarity scores to words and documents as predicted probabilities of seed words to occur in given contexts. These probabilistic polarity scores are more accurate, interpretable and consistent than those spatial polarity models can produce in text analysis. I demonstrate these advantages by applying both probabilistic and spatial models to China Daily's coverage of China and other countries during the coronavirus disease (COVID) pandemic in terms of achievement in health issues. The result suggests that more advanced masked language models would further improve the semisupervised machine learning technique.

A New Semisupervised Technique for Polarity Analysis using Masked Language Models | AI Navigate