DeepSeek AI Releases DeepSeek-V4: Compressed Sparse Attention and Heavily Compressed Attention Enable One-Million-Token Contexts

MarkTechPost / 4/25/2026

📰 NewsSignals & Early TrendsModels & Research

Key Points

  • DeepSeek-AI released a preview of the DeepSeek-V4 model series aimed at making one-million-token context windows practical and cost-effective during inference.
  • The release features two Mixture-of-Experts (MoE) variants, DeepSeek-V4-Pro and DeepSeek-V4-Flash, designed with different total and per-token activated parameter scales.
  • DeepSeek-V4-Pro is described as having 1.6T total parameters with 49B activated per token, while DeepSeek-V4-Flash has 284B total parameters with 13B activated per token.
  • The core technical claim is that “compressed sparse attention” and “heavily compressed attention” reduce the compute/memory burden needed to support extremely long contexts.
  • The preview positioning suggests the models are being introduced for early adoption and evaluation rather than a final, fully generalized release.

DeepSeek-AI has released a preview version of the DeepSeek-V4 series: two Mixture-of-Experts (MoE) language models built around one core challenge making one-million-token context windows practical and affordable at inference time. The series consists of DeepSeek-V4-Pro, with 1.6T total parameters and 49B activated per token, and DeepSeek-V4-Flash, with 284B total parameters and 13B activated per token. […]

The post DeepSeek AI Releases DeepSeek-V4: Compressed Sparse Attention and Heavily Compressed Attention Enable One-Million-Token Contexts appeared first on MarkTechPost.