The Illusion of Stochasticity in LLMs

arXiv cs.CL / 4/9/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that “reliable stochastic sampling” is a critical but not yet satisfied requirement for LLMs when used as autonomous agents that must sample from target probability distributions.
  • It identifies a core failure mode: LLMs cannot consistently translate their internal probability estimates into the stochastic outputs they produce, unlike conventional RL agents that use external sampling mechanisms.
  • Using experiments across multiple model families, sizes, prompting styles, and target distributions, the authors quantify how often and how severely this mismatch occurs.
  • The study finds that frontier models can sometimes use provided random seeds to better match target distributions, but still struggle fundamentally with direct distribution-accurate sampling.

Abstract

In this work, we demonstrate that reliable stochastic sampling is a fundamental yet unfulfilled requirement for Large Language Models (LLMs) operating as agents. Agentic systems are frequently required to sample from distributions, often inferred from observed data, a process which needs to be emulated by the LLM. This leads to a distinct failure point: while standard RL agents rely on external sampling mechanisms, LLMs fail to map their internal probability estimates to their stochastic outputs. Through rigorous empirical analysis across multiple model families, model sizes, prompting styles, and distributions, we demonstrate the extent of this failure. Crucially, we show that while powerful frontier models can convert provided random seeds to target distributions, their ability to sample directly from specific distributions is fundamentally flawed.