Introducing Background Temperature to Characterise Hidden Randomness in Large Language Models

arXiv cs.AI / 4/27/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper observes that LLMs can generate different outputs for the same input even when decoding with nominal temperature T=0, pointing to implementation-level nondeterminism.
  • It introduces a formal concept called “background temperature” (T_bg) to represent the effective randomness introduced by the inference environment, despite setting T=0.
  • The authors relate T_bg to an explicit stochastic perturbation process governed by the inference environment (I) and define an equivalent ideal-system temperature T_n(I) for estimation.
  • They propose an empirical measurement protocol for estimating T_bg and report pilot experiments across a pool of major LLM providers, discussing implications for reproducibility, evaluation, and deployment.

Abstract

Even when decoding with temperature T=0, large language models (LLMs) can produce divergent outputs for identical inputs. Recent work by Thinking Machines Lab highlights implementation-level sources of nondeterminism, including batch-size variation, kernel non-invariance, and floating-point non-associativity. In this short note we formalize this behavior by introducing the notion of \emph{background temperature} T_{\mathrm{bg}}, the effective temperature induced by an implementation-dependent perturbation process observed even when nominal T=0. We provide clean definitions, show how T_{\mathrm{bg}} relates to a stochastic perturbation governed by the inference environment I, and propose an empirical protocol to estimate T_{bg} via the equivalent temperature T_n(I) of an ideal reference system. We conclude with a set of pilot experiments run on a representative pool from the major LLM providers that demonstrate the idea and outline implications for reproducibility, evaluation, and deployment.