VibeFlow: Versatile Video Chroma-Lux Editing through Self-Supervised Learning

arXiv cs.CV / 4/16/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • VibeFlow addresses the challenge of video chroma-lux editing by modifying illumination and color while preserving structural and temporal fidelity.
  • The paper introduces a self-supervised approach that leverages pre-trained video generation models, using a disentangled data perturbation pipeline to recombine structure from source videos with color-illumination cues from reference images.
  • To improve temporal and structural accuracy, VibeFlow adds Residual Velocity Fields and a Structural Distortion Consistency Regularization to mitigate discretization issues common in flow-based methods.
  • The framework is designed to remove the need for expensive supervised training with synthetic paired data and to generalize in a zero-shot manner to tasks like relighting, recoloring, low-light enhancement, day-night translation, and object-specific color editing.
  • The authors report that VibeFlow delivers strong visual quality with reduced computational overhead and provide a public project webpage for replication.

Abstract

Video chroma-lux editing, which aims to modify illumination and color while preserving structural and temporal fidelity, remains a significant challenge. Existing methods typically rely on expensive supervised training with synthetic paired data. This paper proposes VibeFlow, a novel self-supervised framework that unleashes the intrinsic physical understanding of pre-trained video generation models. Instead of learning color and light transitions from scratch, we introduce a disentangled data perturbation pipeline that enforces the model to adaptively recombine structure from source videos and color-illumination cues from reference images, enabling robust disentanglement in a self-supervised manner. Furthermore, to rectify discretization errors inherent in flow-based models, we introduce Residual Velocity Fields alongside a Structural Distortion Consistency Regularization, ensuring rigorous structural preservation and temporal coherence. Our framework eliminates the need for costly training resources and generalizes in a zero-shot manner to diverse applications, including video relighting, recoloring, low-light enhancement, day-night translation, and object-specific color editing. Extensive experiments demonstrate that VibeFlow achieves impressive visual quality with significantly reduced computational overhead. Our project is publicly available at https://lyf1212.github.io/VibeFlow-webpage.