Anchor Forcing: Anchor Memory and Tri-Region RoPE for Interactive Streaming Video Diffusion
arXiv cs.CV / 3/17/2026
📰 NewsModels & Research
Key Points
- The paper introduces Anchor Forcing, a cache-centric framework that addresses two interactive streaming video diffusion failure modes: loss of boundary conditioning at prompt switches and a drift in motion priors due to unbounded time indexing.
- It proposes an anchor-guided re-cache mechanism that stores KV states in anchor caches and warm-starts re-cache from these anchors at each prompt switch to reduce post-switch evidence loss and stabilize perceptual quality.
- It also presents a tri-region RoPE with region-specific reference origins and RoPE re-alignment distillation to reconcile unbounded streaming indices with the pretrained RoPE regime and better retain long-horizon motion priors.
- Experiments on long videos show improved perceptual quality and motion metrics over prior streaming baselines, with a project page provided for implementation details.
Related Articles
[D] Matryoshka Representation Learning
Reddit r/MachineLearning
Two new Qwen3.5 “Neo” fine‑tunes focused on fast, efficient reasoning
Reddit r/LocalLLaMA

HKIC, Gobi Partners and HKU team up for fund backing university research start-ups
SCMP Tech
Yann LeCun’s New LeWorldModel (LeWM) Research Targets JEPA Collapse in Pixel-Based Predictive World Modeling
MarkTechPost
Streaming experts
Simon Willison's Blog