Quantifying Cross-Query Contradictions in Multi-Query LLM Reasoning

arXiv cs.AI / 4/17/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper studies why LLMs often generate mutually inconsistent answers across multiple related queries and frames it as maintaining a globally satisfiable belief state.
It introduces a new benchmark of 390 multi-query reasoning instances labeled as entailment, contradiction, or unknown, along with set-level evaluation metrics such as Case Satisfiability Rate, Contradiction Density, and Revision Cost.
A solver-augmented method is proposed that extracts model commitments, checks global satisfiability, and uses counterexample-guided repair to fix inconsistencies.
Experiments across four reasoning domains show the approach substantially reduces cross-query contradictions (SetCons: 0.56 to 0.94) without sacrificing per-query accuracy, highlighting the importance of global coherence.

Abstract

Large language models frequently produce mutually inconsistent answers when reasoning over multiple related queries. We study case-file logical consistency: maintaining a globally satisfiable belief state across interdependent queries. We introduce a benchmark of 390 multi-query reasoning instances with entailment/contradiction/unknown labels and propose set-level metrics including Case Satisfiability Rate, Contradiction Density and Revision Cost. Our solver-augmented approach extracts commitments, verifies global satisfiability and performs counterexample-guided repair. Across four reasoning domains, our method substantially reduces cross-query contradictions (SetCons: 0.56 to 0.94) while preserving per-query accuracy, demonstrating that global coherence is critical for robust multi-query reasoning.