Introducing Background Temperature to Characterise Hidden Randomness in Large Language Models

arXiv cs.AI / 4/27/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper observes that LLMs can generate different outputs for the same input even when decoding with nominal temperature T=0, pointing to implementation-level nondeterminism.
It introduces a formal concept called “background temperature” (T_bg) to represent the effective randomness introduced by the inference environment, despite setting T=0.
The authors relate T_bg to an explicit stochastic perturbation process governed by the inference environment (I) and define an equivalent ideal-system temperature T_n(I) for estimation.
They propose an empirical measurement protocol for estimating T_bg and report pilot experiments across a pool of major LLM providers, discussing implications for reproducibility, evaluation, and deployment.

Abstract

Even when decoding with temperature

T=0

, large language models (LLMs) can produce divergent outputs for identical inputs. Recent work by Thinking Machines Lab highlights implementation-level sources of nondeterminism, including batch-size variation, kernel non-invariance, and floating-point non-associativity. In this short note we formalize this behavior by introducing the notion of \emph{background temperature}

T_{\mathrm{bg}}

, the effective temperature induced by an implementation-dependent perturbation process observed even when nominal

T=0

. We provide clean definitions, show how

T_{\mathrm{bg}}

relates to a stochastic perturbation governed by the inference environment

I

, and propose an empirical protocol to estimate

T_{bg}

via the equivalent temperature

T_n(I)

of an ideal reference system. We conclude with a set of pilot experiments run on a representative pool from the major LLM providers that demonstrate the idea and outline implications for reproducibility, evaluation, and deployment.

Legal Insight Transformation: 7 Mistakes to Avoid When Adopting AI Tools

Dev.to

Legal Insight Transformation: Traditional vs. AI-Driven Research Compared

Dev.to

Legal Insight Transformation: A Beginner's Guide to Modern Research

Dev.to

I tested the same prompt across multiple AI models… the differences surprised me

Reddit r/artificial

The five loops between AI coding and AI engineering

Dev.to

Introducing Background Temperature to Characterise Hidden Randomness in Large Language Models

Key Points

Abstract

Related Articles

Legal Insight Transformation: 7 Mistakes to Avoid When Adopting AI Tools

Legal Insight Transformation: Traditional vs. AI-Driven Research Compared

Legal Insight Transformation: A Beginner's Guide to Modern Research

I tested the same prompt across multiple AI models… the differences surprised me

The five loops between AI coding and AI engineering

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer