I was paying 3x too much for AI APIs. Here's what I changed.

Dev.to / 4/23/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage

共有:

Key Points

The author realized they were paying about $80/month for AI API side projects due to overusing flagship models (defaulting to Claude 3.5 Sonnet and GPT 4o) even for simple tasks.
They reduced costs by swapping a simple text-cleanup call to a much cheaper model (Gemini 2.5 Flash Lite), cutting per-request costs by roughly 30x while keeping output quality similar.
They lowered expenses further by enabling prompt caching and trimming system prompts, noting that cached tokens and shortening prompts can drastically reduce repeated input costs.
To avoid manual pricing calculations, they built a model cost comparison tool (quantacost.com) that verifies up-to-date pricing and lets users compare many models side by side.
The core takeaway is a rule-of-thumb: start with budget models for “small transformations” and upgrade only if the cheaper option fails, since the cheapest “doesn’t fail” model is usually best.

I started using AI APIs about a year ago for side projects I was hacking on in the evenings. Nothing production scale.

By month three I was running up about $80 a month in charges. Not wild, but when I broke it down, I was spending way more than I needed to. Half of what I was doing could have run on a cheap model for pennies. I was just lazy.

Here's what I actually changed:

First, I stopped using the flagship for everything. My defaults were Claude 3.5 Sonnet and GPT 4o. Both great. Both way overpowered for half of what I asked them.

I had a little utility that turned a messy chunk of text into a clean title. Take in a paragraph, return one sentence. I was using Sonnet at $3 input and $15 output per million tokens. For a task a much simpler model could handle.

Swapping that one call to Gemini 2.5 Flash Lite at $0.10 input and $0.40 output cut the per request cost by about 30x. Output quality was identical.

Rule I follow now. If the task is "transform this text a little," try a budget model first. Only reach for a flagship if the budget one actually fails.

Second, I cached and trimmed my system prompts. Every major provider offers prompt caching now. Anthropic gives you 90 percent off cached tokens. OpenAI does it automatically once your prompt goes over 1,024 tokens.

At 3,000 calls a month with a 600 token system prompt, that prompt alone was costing me $5.40 on Sonnet. With caching, 54 cents.

While I was in there, I actually read my prompt for the first time in months. It was a mess. "Please provide a response." "It would be helpful if you could." Polite costs tokens. I cut it from 600 to 300. Saves 50 percent on input forever.

Read your system prompt out loud. If it sounds like a cover letter, it's too long.

Third, I got tired of doing the math. For every new model I wanted to try, I was running the same spreadsheet. Input tokens times price per million. Output tokens times price per million. Add. Check caching. It took long enough that I'd just pick something and hope.

So I built it into a tool. It's at quantacost.com. Paste text, pick a model, see what it costs. Compare 39 models side by side. Free, no signup. Prices are verified every morning against the official pricing pages, because I got burned once using someone else's calculator with numbers that were a year stale.

The right model for most tasks is not the smartest one. It's the cheapest one that doesn't fail.