How I Found $1,240/Month in Wasted LLM API Costs (And Built a Tool to Find Yours)

Dev.to / 4/5/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical Usage

Key Points

  • The author reports identifying roughly $1,240/month in avoidable LLM API waste out of a $2,847 monthly spend, including retries, duplicates, and excessive context length.
  • Major waste sources included 34% retry calls caused by JSON-in-markdown formatting, 85% duplicate classifier calls due to missing caching, and using GPT-4o for a simple yes/no classification instead of a cheaper model.
  • The chatbot cost was driven by repeatedly sending full conversation history, with the author finding that truncating to the last few messages could save about $155/month.
  • To measure and prevent similar problems, the author built an open-source Python CLI (“LLM Cost Profiler”) that wraps existing LLM clients and logs API calls to a local SQLite database.
  • The profiling tool is designed to be non-invasive—logging is silent and intended not to affect the application if instrumentation fails, while supporting both OpenAI and Anthropic clients.

I was spending about $2,000/month on OpenAI and Anthropic APIs across a few projects.

I knew some of it was wasteful. I just couldn't prove it. The provider dashboards show you one number — your total bill. That's like getting an electricity bill with no breakdown. Is it the AC? The lights? The server room? No idea.

So I built a tool to find out. What it discovered was honestly embarrassing.

What I found

34% of my summarizer calls were retries. The prompt asked for JSON, but the model kept wrapping it in markdown code blocks. My parser rejected it. The retry loop ran the same call again. And again. Each retry cost money. Total waste: about $140/month — from a six-word fix I could have made months ago.

85% of my classifier calls were duplicates. Same input, same output, full price every time. No caching. 723 of 847 weekly calls were completely redundant. A simple cache would have saved $310/month.

My classifier was using GPT-4o for a yes/no task. The output was always under 10 tokens — one of five fixed labels. GPT-4o-mini produces identical results at a fraction of the cost. Savings: $71/month.

My chatbot was stuffing the entire conversation history into every call. By message 20, the input was 3,200 tokens and growing. Only the last few messages mattered. Truncating to the last 5 saves $155/month.

Total: $1,240/month in waste out of a $2,847 monthly spend. That's 43%.

The tool: LLM Cost Profiler

I packaged all of this into an open-source Python CLI. Here's how it works.

Step 1: Install

pip install llm-spend-profiler

Step 2: Wrap your client (2 lines of code)

from llm_cost_profiler import wrap
from openai import OpenAI

client = wrap(OpenAI())

That's it. Your code works exactly as before. Every API call is now silently logged to a local SQLite database. If logging fails for any reason, it fails silently — your app is never affected.

Works with Anthropic too:

from anthropic import Anthropic
client = wrap(Anthropic())

Step 3: See where your money goes

$ llmcost report
LLM Cost Report — Last 7 Days
========================================
Total: $847.32 | 2.4M tokens | 12,847 calls

By Feature:
  summarizer         $412.80  (48.7%)  ████████████████████
  chatbot            $203.11  (24.0%)  ████████████
  classifier          $89.40  (10.5%)  █████
  content_gen         $78.22   (9.2%)  ████
  extraction          $41.50   (4.9%)  ██
  untagged            $22.29   (2.6%)  █

Warnings:
  ⚠ summarizer: 34% of calls are retries ($140.15 wasted)
  ⚠ chatbot: avg 3,200 input tokens but only 180 output tokens (context bloat)
  ⚠ classifier: using gpt-4o but output is always <10 tokens (cheaper model works)

Step 4: Find the waste

$ llmcost optimize
LLM Cost Optimization Report
========================================
Current monthly spend (projected): $2,847
Potential savings found: $1,240/month (43.5%)

  #1 CACHE — classifier.py:34                        [SAVE $310/mo]
     85% of calls are exact duplicates (723 of 847/week)
     → Add @cache decorator
     Confidence: HIGH

  #2 RETRY FIX — content_gen.py:112                   [SAVE $180/mo]
     28% retry rate from JSON parse errors
     → Fix prompt to return raw JSON
     Confidence: HIGH

  #3 MODEL DOWNGRADE — classifier.py:34               [SAVE $71/mo]
     Output is always <10 tokens, one of 5 fixed labels
     → Switch gpt-4o to gpt-4o-mini
     Confidence: MEDIUM

  #4 CONTEXT BLOAT — chatbot.py:123                   [SAVE $155/mo]
     Avg 3,200 input tokens, growing over conversation
     → Truncate history to last 5 messages
     Confidence: MEDIUM

Each recommendation includes the exact file and line number, estimated monthly savings, and a confidence level.

Other features worth knowing about

llmcost hotspots — ranks your code locations by cost. Auto-detected from the Python call stack, no manual annotation needed:

Top Cost Hotspots:
  1. features/summarizer.py:47   summarize_doc()    $412.80/week   4,201 calls
  2. api/chat.py:123             handle_message()   $203.11/week   3,892 calls
  3. pipeline/classify.py:34     classify_text()     $89.40/week   2,847 calls

llmcost compare — week-over-week comparison to catch sudden spikes.

llmcost dashboard — opens a local web dashboard at localhost:8177 with treemap charts, cost timelines, and an optimization waterfall. Single HTML file, no npm, no build step.

Tagging — group costs by feature, customer, or environment:

from llm_cost_profiler import tag

with tag(feature="summarizer", customer="acme_corp"):
    response = client.chat.completions.create(...)

Caching decorator — stop paying for duplicate calls:

from llm_cost_profiler import cache

@cache(ttl=3600)
def classify_text(text):
    return client.chat.completions.create(...)

How it works under the hood

  • Wrapper: Transparent proxy pattern — intercepts method calls without monkey-patching.
  • Storage: SQLite with WAL mode at ~/.llmcost/data.db. Thread-safe.
  • Pricing: Built-in lookup table for OpenAI and Anthropic models.
  • Call site detection: Walks the Python call stack to auto-detect which function triggered each call.
  • Zero dependencies: Only uses the Python standard library.
  • Privacy: Everything stays local. Nothing is sent anywhere.

Try it on your codebase

If you're making LLM API calls in any project, I'm genuinely curious what it finds. In my experience, every codebase has at least one surprise — usually duplicate calls that nobody knew about.

GitHub: https://github.com/BuildWithAbid/llm-cost-profiler
Install: pip install llm-spend-profiler
License: MIT

If you find issues or have ideas for what else it should detect, open an issue or drop a comment here. This is my first open-source project and I'd love feedback.