The Model Was Cheap. The Retries Became the Bill.

Dev.to / 4/3/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical Usage

共有:

Key Points

Low-looking hourly GPU prices can become costly when jobs fail repeatedly, because each failure requires reruns and model reloads.
Retries and restarts quietly multiply the total bill, turning a “cheap” setup into a high overall cost.
The article argues that people often compare only a single run on paper and ignore the real-world cost of repeated failures.
It provides a practical GPU selection rule: use RTX 4090 for small, low-failure-risk experiments, switch to A100 80GB as retries become common, and reserve H100 for clearly huge workloads.
The key takeaway is that the repeated failure pattern—not the model’s unit cost—drives the expense.

The hourly price did not look scary. What hurt was running the same job again, reloading the same model again, and paying for the same mistake again.

Why this gets expensive fast

a weak setup does not only slow the job down, it makes failures more expensive
retries quietly multiply the real bill
cheap hourly pricing looks fine until the job keeps falling over
people compare one run on paper and ignore the ugly reality of repeated runs

The mistake

A lot of people focus on the cheapest hourly card and miss the real cost: reloading models, rerunning jobs, and burning another evening on the same failure pattern.

Practical rule

keep using RTX 4090 for small jobs, low failure risk, and simple experiments
move to A100 80GB when retries and restarts are becoming normal
only evaluate H100 when the workload is already obviously huge