Do LLMs have core beliefs?

arXiv cs.LG / 5/6/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper investigates whether LLMs develop “core beliefs” (commitments that underpin a stable worldview) and how they respond to debunking attempts.
It introduces an evaluation approach called Adversarial Dialogue Trees (ADTs) and tests LLM behavior across five domains: science, history, geography, biology, and mathematics.
Most evaluated LLMs were found to be unable to maintain a stable worldview, indicating that they do not reliably preserve foundational commitments during interaction.
Even improved recent models ultimately failed to hold key commitments when subjected to conversational pressure, suggesting limits to human-like cognition.
Overall, the study reports progress in argumentative skills across generations while concluding that current models still lack an essential component of human-level cognition related to stable core commitments.

Abstract

The rise of Large Language Models (LLMs) has sparked debate about whether these systems exhibit human-level cognition. In this debate, little attention has been paid to a structural component of human cognition: core beliefs, truths that provide a foundation around which we can build a worldview. These commitments usually resist debunking, as abandoning them would represent a fundamental shift in how we see reality. In this paper, we ask whether LLMs hold anything akin to core commitments. Using a probing framework we call Adversarial Dialogue Trees (ADTs) over five domains (science, history, geography, biology, and mathematics), we find that most LLMs fail to maintain a stable worldview. Though some recent models showed improved stability, they still eventually failed to maintain key commitments under conversational pressure. These results document an improvement in argumentative skills across model generations but indicate that all current models lack a key component of human-level cognition.