The Randomness Floor: Measuring Intrinsic Non-Randomness in Language Model Token Distributions
arXiv cs.CL / 4/28/2026
💬 OpinionModels & Research
Key Points
- The paper proposes Entropic Deviation (ED), a normalized KL-divergence metric comparing a language model’s token distribution to the uniform distribution to quantify intrinsic non-randomness.
- Across 31,200 generations over seven models, ED remains substantial even under semantically neutral prompts, suggesting much of the observed non-randomness is embedded in the learned weights rather than being induced by context.
- Transformer families such as Gemma, Llama, and Qwen show convergent ED values despite differences in training data and vocabularies, indicating a structural property of pretrained transformers.
- In contrast, the state space model (Mamba2) exhibits a different “regime” with about twice the ED, lower within-sequence variance, and strong temperature sensitivity, while transformers are comparatively insensitive.
- Cross-lingual tests with Qwen-32B show ED-related gradients that are stable across five languages and persist even when comparing languages that share identical tokeniser subsets, implying language modulates the randomness bound beyond tokenization effects.
Related Articles
LLMs will be a commodity
Reddit r/artificial

What it feels like to have to have Qwen 3.6 or Gemma 4 running locally
Reddit r/LocalLLaMA

Dex lands $5.3M to grow its AI-driven talent matching platform
Tech.eu

AI Voice Agents in Production: What Actually Works in 2026
Dev.to

How we built a browser-based AI Pathology platform
Dev.to