Cross-Resolution Diffusion Models via Network Pruning

arXiv cs.CV / 4/8/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • UNet-based diffusion models often lose semantic alignment and become structurally unstable when generating at resolutions not seen during training.
  • The paper attributes this degradation to resolution-dependent parameter behaviors, where some weights that work at the default scale become harmful after spatial scaling changes.
  • It proposes CR-Diff, a two-stage approach that first performs block-wise pruning to remove adverse weights and then applies pruned output amplification to better purify predictions.
  • Experiments indicate CR-Diff improves perceptual fidelity and semantic coherence across unseen resolutions while largely maintaining performance at the default resolution.
  • The method also enables prompt-specific refinement, allowing targeted quality improvements on demand.

Abstract

Diffusion models have demonstrated impressive image synthesis performance, yet many UNet-based models are trained at certain fixed resolutions. Their quality tends to degrade when generating images at out-of-training resolutions. We trace this issue to resolution-dependent parameter behaviors, where weights that function well at the default resolution can become adverse when spatial scales shift, weakening semantic alignment and causing structural instability in the UNet architecture. Based on this analysis, this paper introduces CR-Diff, a novel method that improves the cross-resolution visual consistency by pruning some parameters of the diffusion model. Specifically, CR-Diff has two stages. It first performs block-wise pruning to selectively eliminate adverse weights. Then, a pruned output amplification is conducted to further purify the pruned predictions. Empirically, extensive experiments suggest that CR-Diff can improve perceptual fidelity and semantic coherence across various diffusion backbones and unseen resolutions, while largely preserving the performance at default resolutions. Additionally, CR-Diff supports prompt-specific refinement, enabling quality enhancement on demand.