Towards Automated Community Notes Generation with Large Vision Language Models for Combating Contextual Deception

arXiv cs.CL / 3/25/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper examines how to automate Community Notes for image-based contextual deception, where misleading captions (time/entity/event) must be corrected rather than simply labeled true/false.
  • It introduces XCheck, a real-world dataset of social posts with Community Notes and external contextual references, addressing prior work’s lack of suitable data and the dynamic nature of deception.
  • The authors propose ACCNote, a retrieval-augmented, multi-agent framework using large vision-language models to generate concise, grounded, context-corrective notes.
  • They also define a new evaluation metric, Context Helpfulness Score (CHS), designed to better reflect whether generated notes actually improve user understanding instead of relying on lexical overlap.
  • Experimental results on XCheck indicate ACCNote improves both deception detection and note-generation quality and outperforms stated baselines including a commercial GPT5-mini tool.

Abstract

Community Notes have emerged as an effective crowd-sourced mechanism for combating online deception on social media platforms. However, its reliance on human contributors limits both the timeliness and scalability. In this work, we study the automated Community Notes generation method for image-based contextual deception, where an authentic image is paired with misleading context (e.g., time, entity, and event). Unlike prior work that primarily focuses on deception detection (i.e., judging whether a post is true or false in a binary manner), Community Notes-style systems need to generate concise and grounded notes that help users recover the missing or corrected context. This problem remains underexplored due to three reasons: (i) datasets that support the research are scarce; (ii) methods must handle the dynamic nature of contextual deception; (iii) evaluation is difficult because standard metrics do not capture whether notes actually improve user understanding. To address these gaps, we curate a real-world dataset, XCheck, comprising X posts with associated Community Notes and external contexts. We further propose the Automated Context-Corrective Note generation method, named ACCNote, which is a retrieval-augmented, multi-agent collaboration framework built on large vision-language models. Finally, we introduce a new evaluation metric, Context Helpfulness Score (CHS), that aligns with user study outcomes rather than relying on lexical overlap. Experiments on our XCheck dataset show that the proposed ACCNote improves both deception detection and note generation performance over baselines, and exceeds a commercial tool GPT5-mini. Together, our dataset, method, and metric advance practical automated generation of context-corrective notes toward more responsible online social networks.