I analyzed 922 agentic task trace and found the secret weapon of DeepSeek v4

Reddit r/LocalLLaMA / 5/7/2026

💬 OpinionSignals & Early TrendsModels & Research

Key Points

  • The author benchmarked DeepSeek v4 (v4 flash) on agentic tasks and found it performs among the best open-source models, but at an unexpectedly low cost.
  • Using OpenRouter pricing as a baseline, the estimated cost ratio versus Opus 4.7 suggested DeepSeek v4 flash should be only ~3% as expensive, yet the benchmark showed it costing about 0.66% per task.
  • In long agentic runs, both models used a similar number of tokens per task (≈962K–966K), so the large cost gap wasn’t mainly explained by token volume.
  • The main “secret weapon” identified was a much higher cache hit rate for DeepSeek v4 flash (97% vs 87%) combined with a more favorable cache read-write price ratio (0.02 vs 0.08), which dramatically reduces effective cost.
  • The analysis was conducted by running long agent loops in openclaw with PI-style agent iteration, using OpenRouter as the model provider.
I analyzed 922 agentic task trace and found the secret weapon of DeepSeek v4

I recently did a benchmark of deepseek v4 in agentic tasks. Performance-wise, it's one of the best open source models, as expected. What really surprised me is the cost. I mean I know it's cheap, but it's cheap in a way that doesn't really make sense.

Cost Estimation

Let's take v4 flash as example since it's not on sale (so it can better reflect the actual provider cost).

deepseek v4 flash price on openrouter

opus 4.7 price on openrouter

Looking at OpenRouter price, deepseek v4 flash price is about 0.03x opus 4.7 price. (We only look at input token price because in long agentic task, input token is the dominant cost.) So if v4 flash uses similar amount of token in a task as opus 4.7, the actual cost should be somewhere around 0.03x compared to using opus.

Actual Data

Then I ran the benchmark, long agentic tasks running in openclaw (which uses PI for agent loop), openrouter as model provider. The actual cost data blew my mind:

Avg Cost Per Task Avg Tokens Per Task Avg Tools Per Task
Opus 4.7 $1.52 966.3K 12.8
DeepSeek v4 Flash $0.01 961.8K 14.8

Somehow deepseek v4 flash cost about 0.0066x per task compared to opus 4.7, given similar amount of token usage and tool calls per task. That's only 1/5 of the price we estimated. How is that possible??

The Secret Weapon

After digging into the raw data and collected more detailed stats, I finally found out why. Secret is cache hit rate and cache read price.

Cache Hit Rate Cache Read-Write Price Ratio
Opus 4.7 87% 0.08
DeepSeek v4 Flash 97% 0.02

The main factor in this case is cache hit rate. DeepSeek somehow managed to achieve 97% cache hit rate!!!

Just in case you don't know how important is this number: at this cache hit rate and read/write price ratio, 1% higher cache hit rate means about 20% lower overall cost.

DS got 10% higher cache rate than opus. That alone cut about 2/3 of the total cost.

The secondary factor is due to extremely low read/write price ratio: each cache hit only cost 0.02x of cache miss in DS, while in opus that is 0.08x. This is also pretty insane as openai/anthropic/gemini are all 0.08~0.1. This alone can further cut the overall cost by half.

Above are just my experiments, measurements and stats. I have no idea how DS achieved those numbers. I appreciate if someone who knows this better can explain (or speculate).

submitted by /u/zylskysniper
[link] [comments]