The hourly price did not look scary. What hurt was running the same job again, reloading the same model again, and paying for the same mistake again.
Why this gets expensive fast
- a weak setup does not only slow the job down, it makes failures more expensive
- retries quietly multiply the real bill
- cheap hourly pricing looks fine until the job keeps falling over
- people compare one run on paper and ignore the ugly reality of repeated runs
The mistake
A lot of people focus on the cheapest hourly card and miss the real cost: reloading models, rerunning jobs, and burning another evening on the same failure pattern.
Practical rule
- keep using RTX 4090 for small jobs, low failure risk, and simple experiments
- move to A100 80GB when retries and restarts are becoming normal
- only evaluate H100 when the workload is already obviously huge
The simple takeaway
If the hourly rate looks cheap but the same job keeps eating another retry, the model is not what got expensive. The repeated failure did.


