Beyond Surface Judgments: Human-Grounded Risk Evaluation of LLM-Generated Disinformation
arXiv cs.AI / 4/10/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that evaluating the disinformation risk of LLM-generated text requires measuring how human readers actually respond, rather than relying on LLM judges as a low-cost stand-in.
- Using 290 aligned articles, 2,043 paired human ratings, and outputs from eight frontier judge models, the authors audit judge-to-human alignment across overall scores, item-level ranking, and reliance on textual signals.
- Results show persistent gaps: LLM judges score more harshly than humans, weakly recover human item-level rankings, and use different cues than human readers.
- The judge models penalize emotional intensity more strongly and place more weight on logical rigor, indicating they are not merely mirroring human evaluation criteria.
- Although the judges agree strongly with each other, they align poorly with human readers, suggesting that internal agreement among judges is not a reliable indicator of validity for proxying reader response.
Related Articles
CIA is trusting AI to help analyze intel from human spies
Reddit r/artificial

LLM API Pricing in 2026: I Put Every Major Model in One Table
Dev.to

i generated AI video on a GTX 1660. here's what it actually takes.
Dev.to
Meta-Optimized Continual Adaptation for planetary geology survey missions for extreme data sparsity scenarios
Dev.to

How To Optimize Enterprise AI Energy Consumption
Dev.to