Who Shapes Brazil's Vaccine Debate? Semi-Supervised Modeling of Stance and Polarization in YouTube's Media Ecosystem

arXiv cs.AI / 4/22/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper analyzes Brazil’s long-term YouTube vaccine debate, addressing gaps in prior research that often relied on English-only data, short windows, or single-vaccine studies.
  • It uses a semi-supervised stance-detection approach (self-labeling and self-training) to classify nearly 1.4 million YouTube comments, improving robustness versus earlier methods.
  • By combining stance with temporal dynamics, engagement metrics, and channel types (legacy media, science communicators, and digital-native outlets), the study maps how pro- and anti-vaccine narratives spread and change over time.
  • The research finds that polarization rises sharply during epidemiological crises like COVID-19 but becomes more fragmented across vaccines and interaction patterns after the pandemic.
  • Science communication and digital-native channels are identified as major hotspots for both supportive and oppositional engagement, suggesting structural vulnerabilities in current health communication.
  • Key implication: the framework and evidence are intended to inform public health agencies, platform governance, and broader efforts to manage online information ecosystems.

Abstract

Vaccination remains a cornerstone of global public health, yet the COVID-19 pandemic exposed how online misinformation, political polarization, and declining institutional trust can undermine immunization efforts. Most of the prior computational studies that analyzed vaccine discourse on social platforms focus on English-language data, specific vaccines, or short time windows, impairing our understanding of long-term dynamics in high-impact, non-English contexts like Brazil, home to one of the world's most comprehensive immunization systems. We here present the largest longitudinal study of Brazil's vaccine discourse on YouTube, leveraging a semi-supervised stance detection framework that combines self-labeling and self-training to classify nearly 1.4 million comments. By integrating stance with temporal patterns, engagement metrics, and channel taxonomy (legacy media, science communicators, digital-native outlets), we map how pro- and anti-vaccine narratives evolve and circulate within a hybrid media ecosystem. Our results show that semi-supervised learning substantially improves stance classification robustness, enabling fine-grained tracking of public attitudes across Brazil's full immunization schedule. Polarization spikes during epidemiological crises, especially COVID-19, but becomes fragmented across vaccines and interaction patterns in the post-pandemic period. Notably, science communication and digital-native channels emerge as the primary loci of both supportive and oppositional engagement, revealing structural vulnerabilities in contemporary health communication. Thus, our work advances computational methods for large-scale stance modeling while offering actionable evidence for public health agencies, platform governance, and online information ecosystems.