Quantifying Cross-Query Contradictions in Multi-Query LLM Reasoning

arXiv cs.AI / 4/17/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies why LLMs often generate mutually inconsistent answers across multiple related queries and frames it as maintaining a globally satisfiable belief state.
  • It introduces a new benchmark of 390 multi-query reasoning instances labeled as entailment, contradiction, or unknown, along with set-level evaluation metrics such as Case Satisfiability Rate, Contradiction Density, and Revision Cost.
  • A solver-augmented method is proposed that extracts model commitments, checks global satisfiability, and uses counterexample-guided repair to fix inconsistencies.
  • Experiments across four reasoning domains show the approach substantially reduces cross-query contradictions (SetCons: 0.56 to 0.94) without sacrificing per-query accuracy, highlighting the importance of global coherence.

Abstract

Large language models frequently produce mutually inconsistent answers when reasoning over multiple related queries. We study case-file logical consistency: maintaining a globally satisfiable belief state across interdependent queries. We introduce a benchmark of 390 multi-query reasoning instances with entailment/contradiction/unknown labels and propose set-level metrics including Case Satisfiability Rate, Contradiction Density and Revision Cost. Our solver-augmented approach extracts commitments, verifies global satisfiability and performs counterexample-guided repair. Across four reasoning domains, our method substantially reduces cross-query contradictions (SetCons: 0.56 to 0.94) while preserving per-query accuracy, demonstrating that global coherence is critical for robust multi-query reasoning.