FrescoDiffusion: 4K Image-to-Video with Prior-Regularized Tiled Diffusion
arXiv cs.CV / 3/19/2026
📰 NewsModels & Research
Key Points
- FrescoDiffusion introduces a training-free method for coherent large-format image-to-video generation from a single complex image, targeting 4K resolutions.
- The method augments tiled denoising with a precomputed latent prior by first generating a low-resolution video to obtain a global reference that captures long-range temporal and spatial structure.
- For 4K generation, per-tile noise predictions are fused with the latent reference at every diffusion timestep using a closed-form least-squares fusion that preserves global coherence while retaining detail.
- Experiments on the VBench-I2V dataset and a fresco I2V dataset show improved global consistency and fidelity over tiled baselines while remaining computationally efficient.
- A spatial regularization variable enables region-level control over motion, allowing explicit trade-offs between creativity and consistency.
Related Articles

I made a 'benchmark' where LLMs write code controlling units in a 1v1 RTS game.
Dev.to

My AI Does Not Have a Clock
Dev.to
How to settle on a coding LLM ? What parameters to watch out for ?
Reddit r/LocalLLaMA

Andrej Karpathy's autonomous AI research agent ran 700 experiments in 2 days and gave a glimpse of where AI is heading
Reddit r/artificial

So cursor admits that Kimi K2.5 is the best open source model
Reddit r/LocalLLaMA