Holy cow, if you guys are running background agents or heavy tool-calling pipelines, you need to test the new Deepseek v4 flash model immediately.
For context, I maintain an open-source agent platform - basically a persistent daemon that handles background python execution and SQLite state management. Because our agents run 24/7 sometimes making hundreds of tool calls an hour, API costs are usually our biggest bottleneck.
Up until yesterday, Deepseek 3.2 was our primary low-cost model. Insane price and comparable perf to SOTA models. but we just hot-swapped v4 flash into our routing, and it's kind of mind-blowing.
A couple things I'm noticing right away:
Tool calling is way sharper. It's nailing our complex JSON schemas natively without hallucinating weird markdown wrappers or dropping keys.
ALso, we do a ton of continuous context stuffing (scraping web data, summarizing it, stashing it in SQLite), and it just doesn't lose the thread even w/ high context workloads All this AND it's literally cheaper than 3.2.
We also use Gemini 3.1 pro for our agents that need the extra smarts, but v4 pro might replace that as well.
If anyone is curious about the architecture we're plugging this into, the open source repo is called Gobii. But honestly, I'm just here to validate the hype. We're making v4 flash + pro the default for our whole orchestration stack (pro for more complex workloads).
Anyone else benchmarking its JSON/tool-calling reliability yet? Curious if you're seeing the same bumps.
[link] [comments]




