Not All Frames Deserve Full Computation: Accelerating Autoregressive Video Generation via Selective Computation and Predictive Extrapolation
arXiv cs.CV / 4/6/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces SCOPE, a training-free framework to speed up autoregressive (AR) video diffusion by reducing wasteful denoising work across frames.
- It uses a tri-modal scheduler—cache, predict, and recompute—so the method can handle intermediate situations where binary reuse/recompute decisions are too coarse.
- Prediction is performed with noise-level Taylor extrapolation, and the approach includes stability controls supported by error propagation analysis.
- SCOPE also applies selective computation by restricting execution to the active frame interval, avoiding uniform processing over the entire valid range.
- Experiments on MAGI-1 and SkyReels-V2 show up to 4.73× speedups with output quality comparable to the original, outperforming prior training-free baselines.




