Controllable Generative Video Compression

arXiv cs.CV / 4/9/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces Controllable Generative Video Compression (CGVC), aiming to reconcile the common tradeoff in perceptual video compression between perceptual realism and faithful signal fidelity.
  • CGVC encodes representative keyframes and uses them as structural priors for generating non-keyframes, while also coding dense per-frame control signals to preserve finer details, structure, and semantics.
  • Non-keyframes are reconstructed via a controllable generative video model that enforces temporal and content consistency guided by the provided priors.
  • To improve color recovery, the authors propose a color-distance-guided keyframe selection algorithm that adaptively chooses keyframes based on color similarity.
  • Experiments indicate CGVC outperforms prior perceptual video compression approaches on both objective signal fidelity and perceptual quality metrics.

Abstract

Perceptual video compression adopts generative video modeling to improve perceptual realism but frequently sacrifices signal fidelity, diverging from the goal of video compression to faithfully reproduce visual signal. To alleviate the dilemma between perception and fidelity, in this paper we propose Controllable Generative Video Compression (CGVC) paradigm to faithfully generate details guided by multiple visual conditions. Under the paradigm, representative keyframes of the scene are coded and used to provide structural priors for non-keyframe generation. Dense per-frame control prior is additionally coded to better preserve finer structure and semantics of each non-keyframe. Guided by these priors, non-keyframes are reconstructed by controllable video generation model with temporal and content consistency. Furthermore, to accurately recover color information of the video, we develop a color-distance-guided keyframe selection algorithm to adaptively choose keyframes. Experimental results show CGVC outperforms previous perceptual video compression method in terms of both signal fidelity and perceptual quality.