Prompt Evaluation Basics: Reproducibility and Accuracy
"Somehow it got better" doesn't fly in development. Prompt changes need their quality measured by evaluation.
Building the Evaluation
- Evaluation dataset: collect representative inputs and expected outputs