What Is Hallucination
The phenomenon where an LLM generates something untrue as if it were real. E.g., "citing a nonexistent paper," "outputting a nonexistent API function," "fabricating a historical fact." Being fluent and persuasive, humans are easily fooled.
Why It Happens: 5 Causes
1. The Nature of Next-Word Prediction
The LLM merely predicts "the word likely to come next from the context so far"; it doesn't directly learn "whether it's true." A plausible word chain is generated.
2. Biased/Old Training Data
It only has info up to the cutoff. It can't answer the latest news, and may fill it with a plausible lie.
3. Knowledge Boundary
For "minor topics" and "niche fields," training data is thin and filled by guesswork.
4. Lack of Context
If the prompt is vague, the LLM fills in arbitrarily. Asked "what is A's family makeup?" with ambiguous which A, it may return a nonsense answer.
5. Compression Loss
An LLM compresses training data as "weight vectors," so accurately reproducing details is hard. A "roughly correct" approximate response is generated.




