Are LLMs More Skeptical of Entertainment News?

arXiv cs.AI / 5/5/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The study investigates whether zero-shot LLMs apply different credibility standards across journalistic genres, specifically entertainment news versus hard news.
Using a within-dataset design on GossipCop from FakeNewsNet, two of four frontier models (DeepSeek-V3.2 and GPT-5.2) show significantly higher false-positive rates for legitimate entertainment news, with gaps of 10.1 and 8.8 percentage points respectively.
Two other models (Claude Opus 4.6 and Gemini 3 Flash) do not exhibit comparable genre asymmetry, indicating the effect is model-dependent.
Style-swap experiments produce only limited, inconsistent changes, and prompt-based mitigation is not generic: framing the model as an entertainment-news fact-checker cuts DeepSeek-V3.2 false positives by about 50% without measurable recall loss, while helping GPT-5.2 little.
Qualitative analysis suggests recurring error patterns, including assuming private-life claims are inherently unverifiable and treating entertainment journalism as an epistemically weaker genre, implying that accuracy metrics can hide structured false positives.

Abstract

Large language models (LLMs) are increasingly used for automated news credibility assessment, yet it remains unclear whether they apply even-handed standards across journalistic genres. We examine whether zero-shot LLMs are more likely to misclassify legitimate entertainment news as fake than legitimate hard news, using a within-dataset design on GossipCop from FakeNewsNet. Across four frontier models, we find a clear but model-specific genre asymmetry: DeepSeek-V3.2 and GPT-5.2 show false-positive-rate gaps of 10.1 and 8.8 percentage points, respectively (both

p < .001

), whereas Claude Opus 4.6 and Gemini 3 Flash show no comparable difference. A style-swap experiment yields only limited and inconsistent changes, suggesting that the asymmetry is not reducible to stylistic register alone. Prompt-based mitigation is likewise possible but not generic: framing the model as an entertainment-news fact-checker reduces false positives for DeepSeek-V3.2 by about 50\% without detectable recall loss, but offers little improvement for GPT-5.2. Exploratory qualitative coding further suggests two recurring error patterns in sampled false positives: treating private-life claims as inherently unverifiable and discounting entertainment journalism as an epistemically weaker genre. Taken together, these findings show that aggregate performance metrics can obscure structured false positives within legitimate journalism. We argue that LLM-based credibility assessment may not only evaluate truth claims but also differentially recognize the legitimacy of journalistic genres, and that evaluation should therefore include genre-stratified false-positive analysis alongside overall accuracy.

When Claims Freeze Because a Provider Record Drifted: The Case for Enrollment Repair Agents

Dev.to

The Cash Is Already Earned: Why Construction Pay Application Exceptions Fit an Agent Better Than SaaS

Dev.to

Why Ship-and-Debit Claim Recovery Is a Better Agent Wedge Than Another “AI Back Office” Tool

Dev.to

AI is getting better at doing things, but still bad at deciding what to do?

Reddit r/artificial

I Built an AI-Powered Chinese BaZi (八字) Fortune Teller — Here's What DeepSeek Revealed About Destiny

Dev.to

Are LLMs More Skeptical of Entertainment News?

Key Points

Abstract

Related Articles

When Claims Freeze Because a Provider Record Drifted: The Case for Enrollment Repair Agents

The Cash Is Already Earned: Why Construction Pay Application Exceptions Fit an Agent Better Than SaaS

Why Ship-and-Debit Claim Recovery Is a Better Agent Wedge Than Another “AI Back Office” Tool

AI is getting better at doing things, but still bad at deciding what to do?

I Built an AI-Powered Chinese BaZi (八字) Fortune Teller — Here's What DeepSeek Revealed About Destiny

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer