Anchor Forcing: Anchor Memory and Tri-Region RoPE for Interactive Streaming Video Diffusion
arXiv cs.CV / 3/17/2026
📰 NewsModels & Research
Key Points
- The paper introduces Anchor Forcing, a cache-centric framework that addresses two interactive streaming video diffusion failure modes: loss of boundary conditioning at prompt switches and a drift in motion priors due to unbounded time indexing.
- It proposes an anchor-guided re-cache mechanism that stores KV states in anchor caches and warm-starts re-cache from these anchors at each prompt switch to reduce post-switch evidence loss and stabilize perceptual quality.
- It also presents a tri-region RoPE with region-specific reference origins and RoPE re-alignment distillation to reconcile unbounded streaming indices with the pretrained RoPE regime and better retain long-horizon motion priors.
- Experiments on long videos show improved perceptual quality and motion metrics over prior streaming baselines, with a project page provided for implementation details.
Related Articles

14 Best Self-Hosted Claude Alternatives for AI and Coding in 2026
Dev.to
[P] Finetuned small LMs to VLM adapters locally and wrote a short article about it
Reddit r/MachineLearning
Experiment: How far can a 28M model go in business email generation?
Reddit r/LocalLLaMA

Qwen 3.5 397b (180gb) scores 93% on MMLU
Reddit r/LocalLLaMA
Qwen 3.5 27B - quantize KV cache or not?
Reddit r/LocalLLaMA