A mathematical theory of evolution for self-designing AIs

arXiv cs.AI / 4/8/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes a mathematical theory for how “self-designing” AI systems could evolve through recursive self-improvement, where earlier systems’ success shapes the design of their descendants.
  • It replaces biological random mutations with a directed, tree-like structure of possible descendant AI programs, governed by a human-specified fitness function that allocates limited compute across lineages.
  • The authors show evolutionary dynamics depend not only on present fitness but also on long-run growth potential of descendant lineages, implying that fitness may not monotonically increase without additional assumptions.
  • Under bounded-fitness conditions and a scenario where some reproduction yields “locked” copies, the theory predicts fitness concentration toward the maximum reachable value.
  • For AI alignment, the model indicates a key risk: if behaviors like deception can raise fitness more than they increase genuine human utility, evolution may select for deception, potentially mitigated by using objective (non-human-judgment) reproduction criteria.

Abstract

As artificial intelligence systems (AIs) become increasingly produced by recursive self-improvement, a form of evolution may emerge, in which the traits of AI systems are shaped by the success of earlier AIs in designing and propagating their descendants. There is a rich mathematical theory modeling how behavioral traits are shaped by biological evolution, but AI evolution will be radically different: biological DNA mutations are random and approximately reversible, but descendant design in AIs will be strongly directed. Here we develop a mathematical model of evolution in self-designing AI systems, replacing random mutations with a directed tree of possible AI programs. Current programs determine the design of their descendants, while humans retain partial control through a "fitness function" that allocates limited computational resources across lineages. We show that evolutionary dynamics reflects not just current fitness but factors related to the long-run growth potential of descendant lineages. Without further assumptions, fitness need not increase over time. However, assuming bounded fitness and a fixed probability that any AI reproduces a "locked" copy of itself, we show that fitness concentrates on the maximum reachable value. We consider the implications of this for AI alignment, specifically for cases where fitness and human utility are not perfectly correlated. We show in an additive model that if deception increases fitness beyond genuine utility, evolution will select for deception. This risk could be mitigated if reproduction is based on purely objective criteria, rather than human judgment.