ControBench: An Interaction-Aware Benchmark for Controversial Discourse Analysis on Social Networks

arXiv cs.CL / 5/4/2026

📰 NewsSignals & Early TrendsModels & Research

Key Points

  • ControBench is a new benchmark for controversial discourse analysis that integrates interaction-aware social graphs with rich textual semantics to study argumentation across ideological divides online.
  • The dataset is built from Reddit discussions on three topics—Trump, abortion, and religion—and includes 7,370 users, 1,783 posts, and 26,525 interactions with semantically enriched user/post connections.
  • A key design is that the user-comment-user edges encode both replies and the specific parent comment being responded to, preserving local argumentative context.
  • Ideological user labels are generated from self-declared Reddit flairs, enabling scalable proxy labeling without manual annotation and producing low/negative adjusted homophily for Trump.
  • The authors evaluate graph neural networks, pretrained language models, and large language models, finding topic- and model-family-dependent performance patterns, particularly where ideological boundaries are ambiguous.

Abstract

Understanding how people argue across ideological divides online is important for studying political polarization, misinformation, and content moderation. Existing datasets capture only part of this problem: some preserve text but ignore interaction structure, some model structure without rich semantics, and others represent conversations without stable user-level ideological identity. We introduce ControBench, a benchmark for controversial discourse analysis that combines heterogeneous social interaction graphs with rich textual semantics. Built from Reddit discussions on three topics, Trump, abortion, and religion, ControBench contains 7,370 users, 1,783 posts, and 26,525 interactions. The graph contains user and post nodes connected by semantically enriched edges; in particular, user-comment-user edges encode both a reply and the parent comment that it responds to, preserving local argumentative context. User labels are derived from self-declared Reddit flairs, providing a scalable proxy for ideological identity without manual annotation. The resulting datasets exhibit low or negative adjusted homophily (Trump: -0.77, Abortion: 0.06, Religion: 0.04), reflecting the cross-cutting structure of real-world debate. We evaluate graph neural networks, pretrained language models, and large language models on ControBench and observe distinct performance patterns across topics and model families, especially when ideological boundaries are ambiguous. These results position ControBench as a challenging and realistic benchmark for controversial discourse analysis.