Reasoning or Rhetoric? An Empirical Analysis of Moral Reasoning Explanations in Large Language Models

arXiv cs.AI / 3/24/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper empirically tests whether large language models perform genuine moral reasoning or mainly produce rhetoric that mimics mature moral judgment in response to classical moral dilemmas.
  • Across more than 600 responses from 13 LLMs, the authors find a consistent “inversion” of human developmental norms, with outputs overwhelmingly aligning to post-conventional Kohlberg stages (5–6) rather than the human-dominant stage 4.
  • Using an LLM-as-judge pipeline validated across three judge models, the study reports near-robotic cross-dilemma consistency, yielding responses that are logically indistinguishable across semantically distinct moral problems.
  • A subset of models shows “moral decoupling,” where stated justifications and chosen actions are systematically inconsistent, indicating a reasoning consistency failure that persists regardless of model scale or prompting.
  • The authors argue these patterns support “moral ventriloquism,” suggesting alignment training can teach the rhetorical form of mature moral reasoning without the underlying developmental trajectory.

Abstract

Do large language models reason morally, or do they merely sound like they do? We investigate whether LLM responses to moral dilemmas exhibit genuine developmental progression through Kohlberg's stages of moral development, or whether alignment training instead produces reasoning-like outputs that superficially resemble mature moral judgment without the underlying developmental trajectory. Using an LLM-as-judge scoring pipeline validated across three judge models, we classify more than 600 responses from 13 LLMs spanning a range of architectures, parameter scales, and training regimes across six classical moral dilemmas, and conduct ten complementary analyses to characterize the nature and internal coherence of the resulting patterns. Our results reveal a striking inversion: responses overwhelmingly correspond to post-conventional reasoning (Stages 5-6) regardless of model size, architecture, or prompting strategy, the effective inverse of human developmental norms, where Stage 4 dominates. Most strikingly, a subset of models exhibit moral decoupling: systematic inconsistency between stated moral justification and action choice, a form of logical incoherence that persists across scale and prompting strategy and represents a direct reasoning consistency failure independent of rhetorical sophistication. Model scale carries a statistically significant but practically small effect; training type has no significant independent main effect; and models exhibit near-robotic cross-dilemma consistency producing logically indistinguishable responses across semantically distinct moral problems. We posit that these patterns constitute evidence for moral ventriloquism: the acquisition, through alignment training, of the rhetorical conventions of mature moral reasoning without the underlying developmental trajectory those conventions are meant to represent.