When Self-Reference Fails to Close: Matrix-Level Dynamics in Large Language Models

arXiv cs.CL / 4/15/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The arXiv paper analyzes how self-referential prompts change internal matrix-level dynamics in large language models using 106 scalar metrics across multiple model families and analysis passes.
It finds that self-reference is generally stable when it is grounded or framed as meta-cognition, while paradoxical self-reference is more likely to trigger key instability signals.
The main instability source is identified as non-closing truth recursion (NCTR), where truth-value computation fails to reach any finite-depth resolution.
NCTR prompts show anomalously elevated attention effective rank and disrupted per-layer SVD patterns across all sampled layers, indicating global attention reorganization rather than simple collapse.
The authors connect the findings to classical matrix-semigroup problems, propose a conjecture linking NCTR to specific dynamical regimes in finite-depth transformers, and report higher contradictory outputs from NCTR prompts.

Abstract

We investigate how self-referential inputs alter the internal matrix dynamics of large language models. Measuring 106 scalar metrics across up to 7 analysis passes on four models from three architecture families -- Qwen3-VL-8B, Llama-3.2-11B, Llama-3.3-70B, and Gemma-2-9B -- over 300 prompts in a 14-level hierarchy at three temperatures (

T \in \{0.0, 0.3, 0.7\}

), we find that self-reference alone is not destabilizing: grounded self-referential statements and meta-cognitive prompts are markedly more stable than paradoxical self-reference on key collapse-related metrics, and on several such metrics can be as stable as factual controls. Instability concentrates in prompts inducing non-closing truth recursion (NCTR) -- truth-value computations with no finite-depth resolution. NCTR prompts produce anomalously elevated attention effective rank -- indicating attention reorganization with global dispersion rather than simple concentration collapse -- and key metrics reach Cohen's

d = 3.14

(attention effective rank) to

3.52

(variance kurtosis) vs. stable self-reference in the 70B model; 281/397 metric-model combinations differentiate NCTR from stable self-reference after FDR correction (

q < 0.05

), 198 with

|d| > 0.8

. Per-layer SVD confirms disruption at every sampled layer (

d > +1.0

in all three models analyzed), ruling out aggregation artifacts. A classifier achieves AUC

0.81

0.90

; 30 minimal pairs yield 42/387 significant combinations; 43/106 metrics replicate across all four models. We connect these observations to three classical matrix-semigroup problems and propose, as a conjecture, that NCTR forces finite-depth transformers toward dynamical regimes where these problems concentrate. NCTR prompts also produce elevated contradictory output (

+34

56

percentage points vs. controls), suggesting practical relevance for understanding self-referential failure modes.

Black Hat Asia

AI Business

Anthropic prepares Opus 4.7 and AI design tool, VCs offer up to 800 billion dollars

THE DECODER

After sale of its shoe business, Allbirds pivots to AI

TechCrunch

ChatGPT Custom Instructions: The Ultimate Setup Guide

Dev.to

Best ChatGPT Alternatives 2026: 8 AI Tools Compared

Dev.to

When Self-Reference Fails to Close: Matrix-Level Dynamics in Large Language Models

Key Points

Abstract

Related Articles

Black Hat Asia

Anthropic prepares Opus 4.7 and AI design tool, VCs offer up to 800 billion dollars

After sale of its shoe business, Allbirds pivots to AI

ChatGPT Custom Instructions: The Ultimate Setup Guide

Best ChatGPT Alternatives 2026: 8 AI Tools Compared

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer