FrescoDiffusion: 4K Image-to-Video with Prior-Regularized Tiled Diffusion
arXiv cs.CV / 3/19/2026
📰 NewsModels & Research
Key Points
- FrescoDiffusion introduces a training-free method for coherent large-format image-to-video generation from a single complex image, targeting 4K resolutions.
- The method augments tiled denoising with a precomputed latent prior by first generating a low-resolution video to obtain a global reference that captures long-range temporal and spatial structure.
- For 4K generation, per-tile noise predictions are fused with the latent reference at every diffusion timestep using a closed-form least-squares fusion that preserves global coherence while retaining detail.
- Experiments on the VBench-I2V dataset and a fresco I2V dataset show improved global consistency and fidelity over tiled baselines while remaining computationally efficient.
- A spatial regularization variable enables region-level control over motion, allowing explicit trade-offs between creativity and consistency.
Related Articles

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.
Reddit r/LocalLLaMA
QwenDean-4B | fine-tuned SLM for UIGen; our first attempt, looking for feedback!
Reddit r/LocalLLaMA
acestep.cpp: portable C++17 implementation of ACE-Step 1.5 music generation using GGML. Runs on CPU, CUDA, ROCm, Metal, Vulkan
Reddit r/LocalLLaMA

**Introducing SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding**
Hugging Face Blog

Newest GPU server in the lab! 72gb ampere vram!
Reddit r/LocalLLaMA