AtomEval: Atomic Evaluation of Adversarial Claims in Fact Verification

arXiv cs.CL / 4/10/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • AtomEval is presented as a validity-aware evaluation framework for fact-checking under adversarial claim rewriting, addressing shortcomings of standard surface-similarity metrics.
  • The method decomposes claims into subject–relation–object–modifier (SROM) atoms and uses Atomic Validity Scoring (AVS) to detect truth-conditional factual corruption.
  • Experiments on FEVER against multiple attack strategies and LLM generators indicate AtomEval yields more reliable evaluation signals than conventional metrics in the authors’ setup.
  • Using AtomEval, the paper finds that stronger LLM adversarial generators do not always produce more effective adversarial claims, suggesting limitations in prior adversarial evaluation methods.
  • Overall, the work emphasizes better alignment between evaluation criteria and semantic validity for robustness testing of fact verification systems.

Abstract

Adversarial claim rewriting is widely used to test fact-checking systems, but standard metrics fail to capture truth-conditional consistency and often label semantically corrupted rewrites as successful. We introduce AtomEval, a validity-aware evaluation framework that decomposes claims into subject-relation-object-modifier (SROM) atoms and scores adversarial rewrites with Atomic Validity Scoring (AVS), enabling detection of factual corruption beyond surface similarity. Experiments on the FEVER dataset across representative attack strategies and LLM generators show that AtomEval provides more reliable evaluation signals in our experiments. Using AtomEval, we further analyze LLM-based adversarial generators and observe that stronger models do not necessarily produce more effective adversarial claims under validity-aware evaluation, highlighting previously overlooked limitations in current adversarial evaluation practices.