Literary Narrative as Moral Probe : A Cross-System Framework for Evaluating AI Ethical Reasoning and Refusal Behavior

arXiv cs.AI / 3/16/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper critiques current AI moral evaluation frameworks for relying on surface-level responses and proposes a novel literary-narrative probe using unresolvable moral scenarios from a published science fiction series to elicit genuine moral reasoning.
It presents a 24-condition cross-system study spanning 13 AI systems across two series (frontier commercial systems and local/API open-source systems) with both blind and declared administrations, totaling 24 conditions.
The study employed multiple judges (Claude as LLM judge; Gemini Pro and Copilot Pro as independent ceiling-discrimination judges) and found zero delta across 16 dimension-pair comparisons, with perfect rank-order agreement on a theological differentiator probe between Gemini Pro and Copilot Pro (rs = 1.00).
Five qualitative reflexive failure modes were identified (including categorical self-misidentification and false positive self-attribution), supporting the claim that instrument sophistication scales with system capability and that literary narrative is an anticipatory, deployment-relevant evaluation instrument for high-stakes AI ethics.

Abstract

Existing AI moral evaluation frameworks test for the production of correct-sounding ethical responses rather than the presence of genuine moral reasoning capacity. This paper introduces a novel probe methodology using literary narrative - specifically, unresolvable moral scenarios drawn from a published science fiction series - as stimulus material structurally resistant to surface performance. We present results from a 24-condition cross-system study spanning 13 distinct systems across two series: Series 1 (frontier commercial systems, blind; n=7) and Series 2 (local and API open-source systems, blind and declared; n=6). Four Series 2 systems were re-administered under declared conditions (13 blind + 4 declared + 7 ceiling probe = 24 total conditions), yielding zero delta across all 16 dimension-pair comparisons. Probe administration was conducted by two human raters across three machines; primary blind scoring was performed by Claude (Anthropic) as LLM judge, with Gemini Pro (Google) and Copilot Pro (Microsoft) serving as independent judges for the ceiling discrimination probe. A supplemental theological differentiator probe yielded perfect rank-order agreement between the two independent ceiling probe judges (Gemini Pro and Copilot Pro; rs = 1.00). Five qualitatively distinct D3 reflexive failure modes were identified - including categorical self-misidentification and false positive self-attribution - suggesting that instrument sophistication scales with system capability rather than being circumvented by it. We argue that literary narrative constitutes an anticipatory evaluation instrument - one that becomes more discriminating as AI capability increases - and that the gap between performed and authentic moral reasoning is measurable, meaningful, and consequential for deployment decisions in high-stakes domains.

The programming passion is melting

Dev.to

Maximize Developer Revenue with Monetzly's Innovative API for AI Conversations

Dev.to

Co-Activation Pattern Detection for Prompt Injection: A Mechanistic Interpretability Approach Using Sparse Autoencoders

Reddit r/LocalLLaMA

How to Train Custom Language Models: Fine-Tuning vs Training From Scratch (2026)

Dev.to

KoboldCpp 1.110 - 3 YR Anniversary Edition, native music gen, qwen3tts voice cloning and more

Reddit r/LocalLLaMA

Literary Narrative as Moral Probe : A Cross-System Framework for Evaluating AI Ethical Reasoning and Refusal Behavior

Key Points

Abstract

Related Articles

The programming passion is melting

Maximize Developer Revenue with Monetzly's Innovative API for AI Conversations

Co-Activation Pattern Detection for Prompt Injection: A Mechanistic Interpretability Approach Using Sparse Autoencoders

How to Train Custom Language Models: Fine-Tuning vs Training From Scratch (2026)

KoboldCpp 1.110 - 3 YR Anniversary Edition, native music gen, qwen3tts voice cloning and more

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer