DVFace: Spatio-Temporal Dual-Prior Diffusion for Video Face Restoration

arXiv cs.CV / 4/17/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • DVFace is a newly proposed one-step diffusion framework specifically designed to restore degraded video faces with realistic details while maintaining stable identity and temporal coherence.
  • The method uses a spatio-temporal dual-codebook design to extract complementary spatial and temporal facial priors from degraded input videos.
  • An asymmetric spatio-temporal fusion module injects these priors into the diffusion backbone according to their different roles, aiming to improve fidelity without expensive multi-step sampling.
  • Experiments across multiple benchmarks indicate that DVFace achieves better restoration quality, stronger temporal consistency, and improved identity preservation than recent competing approaches.
  • The paper provides an open-source implementation via the linked GitHub repository, enabling further research and adoption.

Abstract

Video face restoration aims to enhance degraded face videos into high-quality results with realistic facial details, stable identity, and temporal coherence. Recent diffusion-based methods have brought strong generative priors to restoration and enabled more realistic detail synthesis. However, existing approaches for face videos still rely heavily on generic diffusion priors and multi-step sampling, which limit both facial adaptation and inference efficiency. These limitations motivate the use of one-step diffusion for video face restoration, yet achieving faithful facial recovery alongside temporally stable outputs remains challenging. In this paper, we propose, DVFace, a one-step diffusion framework for real-world video face restoration. Specifically, we introduce a spatio-temporal dual-codebook design to extract complementary spatial and temporal facial priors from degraded videos. We further propose an asymmetric spatio-temporal fusion module to inject these priors into the diffusion backbone according to their distinct roles. Evaluation on various benchmarks shows that DVFace delivers superior restoration quality, temporal consistency, and identity preservation compared to recent methods. Code: https://github.com/zhengchen1999/DVFace.