The 3-Stage Lifecycle
An LLM works in 3 stages: "pre-training → post-training → inference." Each differs greatly in cost structure and difficulty.
1. Pre-training
The stage of making the model learn "how to use language" and "knowledge of the world."
- Data: tens of trillions of tokens from web, books, papers, code, images
- Task: "predict the next word" (next-token prediction)
- Compute: GPT-4 class equals USD 10B-50B of electricity
- Period: weeks to months, thousands to tens of thousands of GPUs running continuously
- Who: limited players like OpenAI, Anthropic, Google, Meta, Mistral
In this phase, "world common sense," "grammar," "the seed of logical reasoning" form.
2. Post-training
Pre-training alone is just a "next-word predictor," so adjustment is needed to follow human instructions, not say harmful things, have natural dialogue.
SFT (Supervised Fine-Tuning)
Fine-tune with "question → ideal answer" pairs. Acquires initial instruction-following.


