An Information-theoretic Propagation Denoising and Fusion Framework for Fake News Detection

arXiv cs.CL / 5/5/2026

📰 NewsModels & Research

Key Points

  • The paper addresses a key limitation in fake news detection: incomplete or unreliable propagation data makes robust classification difficult, especially when synthetic interactions are generated to fill gaps.
  • It critiques direct fusion of synthetic propagation (often produced via LLM role-playing) because the synthetic data can introduce biased, low-quality signals that hurt representation learning.
  • The authors propose InfoPDF, an information-theoretic propagation denoising and fusion framework that treats attribute-specific synthetic propagation as probabilistic latent distributions to enable reliability-aware fusion with real propagation.
  • Training uses a mutual-information-based objective that (i) suppresses noisy signals across synthetic propagation by attribute, (ii) preserves consistency between real and synthetic representations, and (iii) keeps representations sufficient for fake news detection and attribute prediction.
  • Experiments on three real-world datasets show InfoPDF delivers consistently better results across multiple fake news detection tasks and can estimate attribute-level reliability while learning more discriminative propagation representations.

Abstract

Incomplete propagation data significantly hinders robust fake news detection. Recent approaches leverage large language models to simulate missing user interactions via role-playing, thereby enriching propagation with synthetic signals. However, such propagation data is intrinsically unreliable, and directly fusing it can lead to biased representations, leading to limited detection performance. In this paper, we alleviate the unreliability of synthetic propagation from the mutual information perspective and propose a novel information-theoretic propagation denoising and fusion (InfoPDF) framework to learn effective representations from both real and synthetic propagation. Specifically, we first generate attribute-specific synthetic propagation using large language models. Then we model each synthetic propagation graph as a probabilistic latent distribution to guide reliability-aware adaptive fusion with real propagation. During training, we design a mutual information-based objective to learn compressed and task-sufficient propagation representations. It jointly suppresses noisy signals across attribute-specific synthetic propagation, maintains consistency between real and synthetic propagation representations, and ensures task sufficiency for fake news detection and attribute prediction. Experiments on three real-world datasets show that InfoPDF consistently achieves superior performance across various fake news detection tasks. Further analysis demonstrates that InfoPDF can estimate attribute-level reliabilities and learn more discriminative propagation representations.