Decreased Intelligence Density in DeepSeek V4 Pro

Reddit r/LocalLLaMA / 4/25/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The discussion claims that DeepSeek V4 Pro uses more tokens than DeepSeek V3.2 even in non-thinking mode, indicating reduced “intelligence density.”
  • It notes that V4 Pro (1.6T) is much larger than V3.2 (0.67T), and the token usage increase suggests efficiency did not improve.
  • Compared with GPT-5.4 and GPT-5.5, the gap is reported to be larger, with DeepSeek allegedly needing around 10× more tokens for similar performance.
  • Given similar token processing speeds (TPS), the post infers DeepSeek V4 Pro may take roughly 10× longer to complete the same tasks.
  • Overall, the excerpt challenges the expectation that scaling would optimize reasoning efficiency, arguing that compute/token efficiency has worsened in the newer model.
Decreased Intelligence Density in DeepSeek V4 Pro

In the V3.2 paper, they mentioned:

Second, token efficiency remains a challenge; DeepSeek-V3.2 typically requires longer generation trajectories (i.e., more tokens) to match the output quality of models like Gemini 3.0-Pro. Future work will focus on optimizing the intelligence density of the model’s reasoning chains to improve efficiency.

However, in V4 Pro, the situation seems to have worsened. Even the non-thinking mode uses significantly more tokens than V3.2, and V4 Pro (1.6T) is roughly 2.5x larger than V3.2 (0.67T). This suggests that the intelligence density of the model has decreased rather than improved!

If we compare it with GPT-5.4 and GPT-5.5, the gap is even larger. DeepSeek appears to require around 10x more tokens to achieve similar performance. Assuming the same TPS, this implies roughly 10x longer for DeepSeek V4 Pro to complete the same task.

submitted by /u/Mindless_Pain1860
[link] [comments]