Long-Horizon Streaming Video Generation via Hybrid Attention with Decoupled Distillation
arXiv cs.CV / 4/14/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces “Hybrid Forcing,” a hybrid attention architecture aimed at improving long-horizon streaming video generation by better retaining distant temporal history than sliding-window attention alone.
- It combines lightweight linear temporal attention with compact key-value state to absorb and retain evicted tokens, and block-sparse local attention to cut redundant short-range computation.
- The authors propose a decoupled distillation approach: initial few-step distillation under dense attention, followed by activating distillation for the linear and block-sparse components for stable training.
- Experiments across short- and long-form video generation benchmarks report state-of-the-art performance, including real-time, unbounded 832×480 generation at 29.5 FPS on a single NVIDIA H100 GPU without quantization or compression.
- Code and trained models are provided via the linked GitHub repository, enabling replication and further development of the method.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles

Black Hat Asia
AI Business

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to

Don't forget, there is more than forgetting: new metrics for Continual Learning
Dev.to

Microsoft MAI-Image-2-Efficient Review 2026: The AI Image Model Built for Production Scale
Dev.to
Bit of a strange question?
Reddit r/artificial