Co-FactChecker: A Framework for Human-AI Collaborative Claim Verification Using Large Reasoning Models

arXiv cs.CL / 4/16/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that current LLM/LRM-based claim verification struggles because models lack the domain grounding and contextual understanding that professional fact-checkers use.
  • It proposes Co-FactChecker, a human-AI collaborative framework that converts expert feedback into targeted “trace-edits” to modify the model’s reasoning trace.
  • Co-FactChecker introduces an interaction paradigm where the model’s thinking trace functions as a shared scratchpad, avoiding limitations of natural-language multi-turn dialogue for calibration.
  • The authors provide theoretical analysis suggesting trace-editing can outperform multi-turn dialogue-based collaboration, and report automatic evaluations where Co-FactChecker beats prior autonomous and human-AI approaches.
  • Human evaluations also find Co-FactChecker yields higher-quality reasoning and verdicts and produces thinking traces that are easier to interpret and more useful than multi-turn dialogue.

Abstract

Professional fact-checkers rely on domain knowledge and deep contextual understanding to verify claims. Large language models (LLMs) and large reasoning models (LRMs) lack such grounding and primarily reason from available evidence alone, creating a mismatch between expert-led and fully automated claim verification. To mitigate this gap, we posit human-AI collaboration as a more promising path forward, where expert feedback, grounded in real-world knowledge and domain expertise, guides the model's reasoning. However, existing LRMs are hard to calibrate to natural language feedback, particularly in a multi-turn interaction setup. We propose Co-FactChecker, a framework for human-AI collaborative claim verification. We introduce a new interaction paradigm that treats the model's thinking trace as a shared scratchpad. Co-FactChecker translates expert feedback into trace-edits that introduce targeted modifications to the trace, sidestepping the shortcomings of dialogue-based interaction. We provide theoretical results showing that trace-editing offers advantages over multi-turn dialogue, and our automatic evaluations demonstrate that Co-FactChecker outperforms existing autonomous and human-AI collaboration approaches. Human evaluations further show that Co-FactChecker is preferred over multi-turn dialogue, producing higher quality reasoning and verdicts along with relatively easier to interpret and more useful thinking traces.