When Hate Meets Facts: LLMs-in-the-Loop for Check-worthiness Detection in Hate Speech

arXiv cs.CL / 3/27/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper releases WSF-ARG+, a dataset that jointly labels hate speech with “check-worthiness” (whether embedded claims merit fact-checking) to address the overlap between hate content and misinformation.
  • It proposes an LLM-in-the-loop annotation framework that uses 12 open-weight LLMs to reduce human annotation effort while maintaining annotation quality, validated via extensive human evaluation.
  • The authors find that hate speech containing check-worthy claims is associated with significantly higher harassment and hate intensity.
  • Incorporating check-worthiness labels improves LLM-based hate speech detection performance, reporting gains up to 0.213 macro-F1 for large models (and 0.154 macro-F1 on average).

Abstract

Hateful content online is often expressed using fact-like, not necessarily correct information, especially in coordinated online harassment campaigns and extremist propaganda. Failing to jointly address hate speech (HS) and misinformation can deepen prejudice, reinforce harmful stereotypes, and expose bystanders to psychological distress, while polluting public debate. Moreover, these messages require more effort from content moderators because they must assess both harmfulness and veracity, i.e., fact-check them. To address this challenge, we release WSF-ARG+, the first dataset which combines hate speech with check-worthiness information. We also introduce a novel LLM-in-the-loop framework to facilitate the annotation of check-worthy claims. We run our framework, testing it with 12 open-weight LLMs of different sizes and architectures. We validate it through extensive human evaluation, and show that our LLM-in-the-loop framework reduces human effort without compromising the annotation quality of the data. Finally, we show that HS messages with check-worthy claims show significantly higher harassment and hate, and that incorporating check-worthiness labels improves LLM-based HS detection up to 0.213 macro-F1 and to 0.154 macro-F1 on average for large models.