The LLM Rolls Dice
An LLM may return different output every time for the same input. This is because the LLM, after computing "the probability distribution of the next word," samples (draws lots) from that distribution. Parameters like temperature, top-p, top-k control how probability is rolled.
Temperature
A value 0 to 2. Controls how much to "sharpen / flatten" the probability distribution.
Example: The Word After "Japan's capital is"
| Candidate | Raw prob | After T=0.5 | After T=2.0 |
|---|---|---|---|
| Tokyo | 0.95 | 0.99 | 0.7 |
| Kyoto | 0.03 | 0.005 | 0.15 |
| Osaka | 0.01 | 0.001 | 0.10 |
| Other | 0.01 | 0.001 | 0.05 |
Temperature Guide
- 0.0: fully deterministic (same every time). For tests/classification.
- 0.0-0.3: nearly deterministic. Fact-check, extraction, summary.
- 0.5-0.7: standard. Chat, Q&A, coding.
- 0.8-1.0: creative. Novels, poetry, brainstorm, ad copy.
- 1.0+: very diverse, quality-degradation risk.




