Flying Pigs, FaR and Beyond: Evaluating LLM Reasoning in Counterfactual Worlds

arXiv cs.CL / 3/25/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies whether large language models can maintain logical reasoning in counterfactual (“Counterfactual Worlds”) settings where the prompt contradicts the model’s learned parametric knowledge.
  • Introducing the CounterLogic benchmark, the authors evaluate 11 LLMs and find a consistent drop in counterfactual accuracy of about 14% compared with knowledge-aligned conditions.
  • The results suggest the primary issue is not logical computation itself, but difficulty handling cognitive conflict between the provided context and internal knowledge.
  • Inspired by human metacognition, the paper proposes Flag & Reason (FaR), a two-step prompting approach where the model first flags potential knowledge conflicts before reasoning.
  • FaR substantially improves robustness, reducing the performance gap to roughly 7% and increasing overall accuracy by about 4% versus standard prompting.

Abstract

A fundamental challenge in reasoning is navigating hypothetical, counterfactual worlds where logic may conflict with ingrained knowledge. We investigate this frontier for Large Language Models (LLMs) by asking: Can LLMs reason logically when the context contradicts their parametric knowledge? To facilitate a systematic analysis, we first introduce CounterLogic, a benchmark specifically designed to disentangle logical validity from knowledge alignment. Evaluation of 11 LLMs across six diverse reasoning datasets reveals a consistent failure: model accuracy plummets by an average of 14% in counterfactual scenarios compared to knowledge-aligned ones. We hypothesize that this gap stems not from a flaw in logical processing, but from an inability to manage the cognitive conflict between context and knowledge. Inspired by human metacognition, we propose a simple yet powerful intervention: Flag & Reason (FaR), where models are first prompted to flag potential knowledge conflicts before they reason. This metacognitive step is highly effective, narrowing the performance gap to just 7% and increasing overall accuracy by 4%. Our findings diagnose and study a critical limitation in modern LLMs' reasoning and demonstrate how metacognitive awareness can make them more robust and reliable thinkers.

Flying Pigs, FaR and Beyond: Evaluating LLM Reasoning in Counterfactual Worlds | AI Navigate