I was spending about $2,000/month on OpenAI and Anthropic APIs across a few projects.
I knew some of it was wasteful. I just couldn't prove it. The provider dashboards show you one number — your total bill. That's like getting an electricity bill with no breakdown. Is it the AC? The lights? The server room? No idea.
So I built a tool to find out. What it discovered was honestly embarrassing.
What I found
34% of my summarizer calls were retries. The prompt asked for JSON, but the model kept wrapping it in markdown code blocks. My parser rejected it. The retry loop ran the same call again. And again. Each retry cost money. Total waste: about $140/month — from a six-word fix I could have made months ago.
85% of my classifier calls were duplicates. Same input, same output, full price every time. No caching. 723 of 847 weekly calls were completely redundant. A simple cache would have saved $310/month.
My classifier was using GPT-4o for a yes/no task. The output was always under 10 tokens — one of five fixed labels. GPT-4o-mini produces identical results at a fraction of the cost. Savings: $71/month.
My chatbot was stuffing the entire conversation history into every call. By message 20, the input was 3,200 tokens and growing. Only the last few messages mattered. Truncating to the last 5 saves $155/month.
Total: $1,240/month in waste out of a $2,847 monthly spend. That's 43%.
The tool: LLM Cost Profiler
I packaged all of this into an open-source Python CLI. Here's how it works.
Step 1: Install
pip install llm-spend-profiler
Step 2: Wrap your client (2 lines of code)
from llm_cost_profiler import wrap
from openai import OpenAI
client = wrap(OpenAI())
That's it. Your code works exactly as before. Every API call is now silently logged to a local SQLite database. If logging fails for any reason, it fails silently — your app is never affected.
Works with Anthropic too:
from anthropic import Anthropic
client = wrap(Anthropic())
Step 3: See where your money goes
$ llmcost report
LLM Cost Report — Last 7 Days
========================================
Total: $847.32 | 2.4M tokens | 12,847 calls
By Feature:
summarizer $412.80 (48.7%) ████████████████████
chatbot $203.11 (24.0%) ████████████
classifier $89.40 (10.5%) █████
content_gen $78.22 (9.2%) ████
extraction $41.50 (4.9%) ██
untagged $22.29 (2.6%) █
Warnings:
⚠ summarizer: 34% of calls are retries ($140.15 wasted)
⚠ chatbot: avg 3,200 input tokens but only 180 output tokens (context bloat)
⚠ classifier: using gpt-4o but output is always <10 tokens (cheaper model works)
Step 4: Find the waste
$ llmcost optimize
LLM Cost Optimization Report
========================================
Current monthly spend (projected): $2,847
Potential savings found: $1,240/month (43.5%)
#1 CACHE — classifier.py:34 [SAVE $310/mo]
85% of calls are exact duplicates (723 of 847/week)
→ Add @cache decorator
Confidence: HIGH
#2 RETRY FIX — content_gen.py:112 [SAVE $180/mo]
28% retry rate from JSON parse errors
→ Fix prompt to return raw JSON
Confidence: HIGH
#3 MODEL DOWNGRADE — classifier.py:34 [SAVE $71/mo]
Output is always <10 tokens, one of 5 fixed labels
→ Switch gpt-4o to gpt-4o-mini
Confidence: MEDIUM
#4 CONTEXT BLOAT — chatbot.py:123 [SAVE $155/mo]
Avg 3,200 input tokens, growing over conversation
→ Truncate history to last 5 messages
Confidence: MEDIUM
Each recommendation includes the exact file and line number, estimated monthly savings, and a confidence level.
Other features worth knowing about
llmcost hotspots — ranks your code locations by cost. Auto-detected from the Python call stack, no manual annotation needed:
Top Cost Hotspots:
1. features/summarizer.py:47 summarize_doc() $412.80/week 4,201 calls
2. api/chat.py:123 handle_message() $203.11/week 3,892 calls
3. pipeline/classify.py:34 classify_text() $89.40/week 2,847 calls
llmcost compare — week-over-week comparison to catch sudden spikes.
llmcost dashboard — opens a local web dashboard at localhost:8177 with treemap charts, cost timelines, and an optimization waterfall. Single HTML file, no npm, no build step.
Tagging — group costs by feature, customer, or environment:
from llm_cost_profiler import tag
with tag(feature="summarizer", customer="acme_corp"):
response = client.chat.completions.create(...)
Caching decorator — stop paying for duplicate calls:
from llm_cost_profiler import cache
@cache(ttl=3600)
def classify_text(text):
return client.chat.completions.create(...)
How it works under the hood
- Wrapper: Transparent proxy pattern — intercepts method calls without monkey-patching.
-
Storage: SQLite with WAL mode at
~/.llmcost/data.db. Thread-safe. - Pricing: Built-in lookup table for OpenAI and Anthropic models.
- Call site detection: Walks the Python call stack to auto-detect which function triggered each call.
- Zero dependencies: Only uses the Python standard library.
- Privacy: Everything stays local. Nothing is sent anywhere.
Try it on your codebase
If you're making LLM API calls in any project, I'm genuinely curious what it finds. In my experience, every codebase has at least one surprise — usually duplicate calls that nobody knew about.
GitHub: https://github.com/BuildWithAbid/llm-cost-profiler
Install: pip install llm-spend-profiler
License: MIT
If you find issues or have ideas for what else it should detect, open an issue or drop a comment here. This is my first open-source project and I'd love feedback.




