| I recently did a benchmark of deepseek v4 in agentic tasks. Performance-wise, it's one of the best open source models, as expected. What really surprised me is the cost. I mean I know it's cheap, but it's cheap in a way that doesn't really make sense. Cost EstimationLet's take v4 flash as example since it's not on sale (so it can better reflect the actual provider cost). deepseek v4 flash price on openrouter Looking at OpenRouter price, deepseek v4 flash price is about 0.03x opus 4.7 price. (We only look at input token price because in long agentic task, input token is the dominant cost.) So if v4 flash uses similar amount of token in a task as opus 4.7, the actual cost should be somewhere around 0.03x compared to using opus. Actual DataThen I ran the benchmark, long agentic tasks running in openclaw (which uses PI for agent loop), openrouter as model provider. The actual cost data blew my mind:
Somehow deepseek v4 flash cost about 0.0066x per task compared to opus 4.7, given similar amount of token usage and tool calls per task. That's only 1/5 of the price we estimated. How is that possible?? The Secret WeaponAfter digging into the raw data and collected more detailed stats, I finally found out why. Secret is cache hit rate and cache read price.
The main factor in this case is cache hit rate. DeepSeek somehow managed to achieve 97% cache hit rate!!! Just in case you don't know how important is this number: at this cache hit rate and read/write price ratio, 1% higher cache hit rate means about 20% lower overall cost. DS got 10% higher cache rate than opus. That alone cut about 2/3 of the total cost. The secondary factor is due to extremely low read/write price ratio: each cache hit only cost 0.02x of cache miss in DS, while in opus that is 0.08x. This is also pretty insane as openai/anthropic/gemini are all 0.08~0.1. This alone can further cut the overall cost by half. Above are just my experiments, measurements and stats. I have no idea how DS achieved those numbers. I appreciate if someone who knows this better can explain (or speculate). [link] [comments] |
I analyzed 922 agentic task trace and found the secret weapon of DeepSeek v4
Reddit r/LocalLLaMA / 5/7/2026
💬 OpinionSignals & Early TrendsModels & Research
Key Points
- The author benchmarked DeepSeek v4 (v4 flash) on agentic tasks and found it performs among the best open-source models, but at an unexpectedly low cost.
- Using OpenRouter pricing as a baseline, the estimated cost ratio versus Opus 4.7 suggested DeepSeek v4 flash should be only ~3% as expensive, yet the benchmark showed it costing about 0.66% per task.
- In long agentic runs, both models used a similar number of tokens per task (≈962K–966K), so the large cost gap wasn’t mainly explained by token volume.
- The main “secret weapon” identified was a much higher cache hit rate for DeepSeek v4 flash (97% vs 87%) combined with a more favorable cache read-write price ratio (0.02 vs 0.08), which dramatically reduces effective cost.
- The analysis was conducted by running long agent loops in openclaw with PI-style agent iteration, using OpenRouter as the model provider.
Related Articles

Build Interactive Agents with Generative UI
The Batch

Barry Diller trusts Sam Altman. But ‘trust is irrelevant’ as AGI nears, he says.
TechCrunch

Why ISO/IEC 42001 is the New SOC 2 for AI Startups (And How to Prepare)
Dev.to

Tracing the agent flow in Openai-agents
Dev.to

BizNode Workflow Marketplace: chain multiple bot handles into multi-step pipelines. Client onboarding, contract-to-payment,...
Dev.to