Numerical Instability and Chaos: Quantifying the Unpredictability of Large Language Models

arXiv cs.AI / 4/16/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper analyzes why large language models can become unpredictably unreliable in agentic workflows, tracing this behavior to numerical instability from finite floating-point precision.
  • It characterizes how rounding errors propagate through Transformer layers and identifies an early-layer chaotic “avalanche effect” where tiny perturbations can rapidly amplify or fully dissipate.
  • The authors report universal, scale-dependent chaotic behavior across models and datasets, dividing it into three regimes: stable (errors vanish), chaotic (errors dominate and outputs diverge), and signal-dominated (true input variation overcomes numerical noise).
  • Extensive validation across multiple datasets and Transformer architectures supports the proposed mechanism for unpredictability.

Abstract

As Large Language Models (LLMs) are increasingly integrated into agentic workflows, their unpredictability stemming from numerical instability has emerged as a critical reliability issue. While recent studies have demonstrated the significant downstream effects of these instabilities, the root causes and underlying mechanisms remain poorly understood. In this paper, we present a rigorous analysis of how unpredictability is rooted in the finite numerical precision of floating-point representations, tracking how rounding errors propagate, amplify, or dissipate through Transformer computation layers. Specifically, we identify a chaotic "avalanche effect" in the early layers, where minor perturbations trigger binary outcomes: either rapid amplification or complete attenuation. Beyond specific error instances, we demonstrate that LLMs exhibit universal, scale-dependent chaotic behaviors characterized by three distinct regimes: 1) a stable regime, where perturbations fall below an input-dependent threshold and vanish, resulting in constant outputs; 2) a chaotic regime, where rounding errors dominate and drive output divergence; and 3) a signal-dominated regime, where true input variations override numerical noise. We validate these findings extensively across multiple datasets and model architectures.