AtomEval: Atomic Evaluation of Adversarial Claims in Fact Verification

arXiv cs.CL / 4/10/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

AtomEval is presented as a validity-aware evaluation framework for fact-checking under adversarial claim rewriting, addressing shortcomings of standard surface-similarity metrics.
The method decomposes claims into subject–relation–object–modifier (SROM) atoms and uses Atomic Validity Scoring (AVS) to detect truth-conditional factual corruption.
Experiments on FEVER against multiple attack strategies and LLM generators indicate AtomEval yields more reliable evaluation signals than conventional metrics in the authors’ setup.
Using AtomEval, the paper finds that stronger LLM adversarial generators do not always produce more effective adversarial claims, suggesting limitations in prior adversarial evaluation methods.
Overall, the work emphasizes better alignment between evaluation criteria and semantic validity for robustness testing of fact verification systems.

Abstract

Adversarial claim rewriting is widely used to test fact-checking systems, but standard metrics fail to capture truth-conditional consistency and often label semantically corrupted rewrites as successful. We introduce AtomEval, a validity-aware evaluation framework that decomposes claims into subject-relation-object-modifier (SROM) atoms and scores adversarial rewrites with Atomic Validity Scoring (AVS), enabling detection of factual corruption beyond surface similarity. Experiments on the FEVER dataset across representative attack strategies and LLM generators show that AtomEval provides more reliable evaluation signals in our experiments. Using AtomEval, we further analyze LLM-based adversarial generators and observe that stronger models do not necessarily produce more effective adversarial claims under validity-aware evaluation, highlighting previously overlooked limitations in current adversarial evaluation practices.

CIA is trusting AI to help analyze intel from human spies

Reddit r/artificial

LLM API Pricing in 2026: I Put Every Major Model in One Table

Dev.to

i generated AI video on a GTX 1660. here's what it actually takes.

Dev.to

Meta-Optimized Continual Adaptation for planetary geology survey missions for extreme data sparsity scenarios

Dev.to

How To Optimize Enterprise AI Energy Consumption

Dev.to

AtomEval: Atomic Evaluation of Adversarial Claims in Fact Verification

Key Points

Abstract

Related Articles

CIA is trusting AI to help analyze intel from human spies

LLM API Pricing in 2026: I Put Every Major Model in One Table

i generated AI video on a GTX 1660. here's what it actually takes.

Meta-Optimized Continual Adaptation for planetary geology survey missions for extreme data sparsity scenarios

How To Optimize Enterprise AI Energy Consumption

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer