Cases Where Fine-Tuning (FT) Works
FT is technology to "customize the LLM's behavior." Not everything is solved by FT; there are suited and unsuited cases.
Cases FT Suits
- Want to teach a company-specific tone
- Want a special output format kept every time
- Want to raise accuracy of fixed tasks like classification/extraction
- Want familiarity with a specific domain's jargon
- Want to shrink model size to lower inference cost
Cases FT Doesn't Suit
- Want to add new knowledge → RAG is more effective
- Frequently updated info → RAG (no retraining)
- One-time customization → prompt engineering suffices
Main Methods
Full Fine-tuning
Re-train all model parameters. Dozens of GPUs × days to weeks. Not realistic for frontier models. Barely possible for mid-size models like Llama 3 8B.
LoRA (Low-Rank Adaptation)
Freeze the model body, train only added small matrices. Possible on 1-2 GPUs × a few hours. The most practical.
- Storage: tens of MB (full model is tens of GB)
- Training speed: 3-10x faster
- Easy switching: select among multiple LoRAs at load



