When Hate Meets Facts: LLMs-in-the-Loop for Check-worthiness Detection in Hate Speech

arXiv cs.CL / 3/27/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper releases WSF-ARG+, a dataset that jointly labels hate speech with “check-worthiness” (whether embedded claims merit fact-checking) to address the overlap between hate content and misinformation.
It proposes an LLM-in-the-loop annotation framework that uses 12 open-weight LLMs to reduce human annotation effort while maintaining annotation quality, validated via extensive human evaluation.
The authors find that hate speech containing check-worthy claims is associated with significantly higher harassment and hate intensity.
Incorporating check-worthiness labels improves LLM-based hate speech detection performance, reporting gains up to 0.213 macro-F1 for large models (and 0.154 macro-F1 on average).

Abstract

Hateful content online is often expressed using fact-like, not necessarily correct information, especially in coordinated online harassment campaigns and extremist propaganda. Failing to jointly address hate speech (HS) and misinformation can deepen prejudice, reinforce harmful stereotypes, and expose bystanders to psychological distress, while polluting public debate. Moreover, these messages require more effort from content moderators because they must assess both harmfulness and veracity, i.e., fact-check them. To address this challenge, we release WSF-ARG+, the first dataset which combines hate speech with check-worthiness information. We also introduce a novel LLM-in-the-loop framework to facilitate the annotation of check-worthy claims. We run our framework, testing it with 12 open-weight LLMs of different sizes and architectures. We validate it through extensive human evaluation, and show that our LLM-in-the-loop framework reduces human effort without compromising the annotation quality of the data. Finally, we show that HS messages with check-worthy claims show significantly higher harassment and hate, and that incorporating check-worthiness labels improves LLM-based HS detection up to 0.213 macro-F1 and to 0.154 macro-F1 on average for large models.

I Extended the Trending mcp-brasil Project with AI Generation — Full Tutorial

Dev.to

The Rise of Self-Evolving AI: From Stanford Theory to Google AlphaEvolve and Berkeley OpenSage

Dev.to

AI 自主演化的時代來臨：從 Stanford 理論到 Google AlphaEvolve 與 Berkeley OpenSage

Dev.to

Neural Networks in Mobile Robot Motion

Dev.to

Retraining vs Fine-tuning or Transfer Learning? [D]

Reddit r/MachineLearning

When Hate Meets Facts: LLMs-in-the-Loop for Check-worthiness Detection in Hate Speech

Key Points

Abstract

Related Articles

I Extended the Trending mcp-brasil Project with AI Generation — Full Tutorial

The Rise of Self-Evolving AI: From Stanford Theory to Google AlphaEvolve and Berkeley OpenSage

AI 自主演化的時代來臨：從 Stanford 理論到 Google AlphaEvolve 與 Berkeley OpenSage

Neural Networks in Mobile Robot Motion

Retraining vs Fine-tuning or Transfer Learning? [D]

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer