What does a system modify when it modifies itself?

arXiv cs.AI / 3/31/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper asks what a cognitive system changes when it self-modifies—whether it updates low-level rules, control rules, or the evaluative norms behind its revisions—and argues that both cognitive science and contemporary AI lack a shared formal framework to distinguish these targets.
  • It proposes a minimal formal structure for self-modifying systems—rule hierarchies, a fixed core, and a separation between effective, represented, and causally accessible rules—and derives four self-modification regimes (action-only, low-level modification, structural modification, and teleological revision).
  • Applying the framework to humans, the authors claim a “crossing of opacities”: causal power and self-representation concentrate at higher hierarchical levels, while lower operational levels remain comparatively opaque.
  • For reflexive AI systems, the paper argues the inverse pattern: operational levels have richer representation and causal accessibility, whereas the highest evaluative level lacks such access.
  • The framework is linked to theories of artificial consciousness, yields four testable predictions, and lists four open problems including independence of transformativity vs. autonomy and identity under transformation.

Abstract

When a cognitive system modifies its own functioning, what exactly does it modify: a low-level rule, a control rule, or the norm that evaluates its own revisions? Cognitive science describes executive control, metacognition, and hierarchical learning with precision, but lacks a formal framework distinguishing these targets of transformation. Contemporary artificial intelligence likewise exhibits self-modification without common criteria for comparison with biological cognition. We show that the question of what counts as a self-modifying system entails a minimal structure: a hierarchy of rules, a fixed core, and a distinction between effective rules, represented rules, and causally accessible rules. Four regimes are identified: (1) action without modification, (2) low-level modification, (3) structural modification, and (4) teleological revision. Each regime is anchored in a cognitive phenomenon and a corresponding artificial system. Applied to humans, the framework yields a central result: a crossing of opacities. Humans have self-representation and causal power concentrated at upper hierarchical levels, while operational levels remain largely opaque. Reflexive artificial systems display the inverse profile: rich representation and causal access at operational levels, but none at the highest evaluative level. This crossed asymmetry provides a structural signature for human-AI comparison. The framework also offers insight into artificial consciousness, with higher-order theories and Attention Schema Theory as special cases. We derive four testable predictions and identify four open problems: the independence of transformativity and autonomy, the viability of self-modification, the teleological lock, and identity under transformation.