OrbitNVS: Harnessing Video Diffusion Priors for Novel View Synthesis

arXiv cs.CV / 3/23/2026

📰 NewsModels & Research

Key Points

  • OrbitNVS reframes novel view synthesis as an orbit video generation task, leveraging pre-trained video diffusion priors to generate unseen viewpoints with higher quality.
  • The approach adds camera adapters to the video model to enable accurate camera control across viewpoints during synthesis.
  • A normal map generation branch and attention-guided use of normal map features improve geometry consistency between views.
  • Pixel-space supervision is employed to reduce blur from latent-space spatial compression, achieving stronger PSNR gains on GSO and OmniObject3D benchmarks, especially in single-view scenarios.

Abstract

Novel View Synthesis (NVS) aims to generate unseen views of a 3D object given a limited number of known views. Existing methods often struggle to synthesize plausible views for unobserved regions, particularly under single-view input, and still face challenges in maintaining geometry- and appearance-consistency. To address these issues, we propose OrbitNVS, which reformulates NVS as an orbit video generation task. Through tailored model design and training strategies, we adapt a pre-trained video generation model to the NVS task, leveraging its rich visual priors to achieve high-quality view synthesis. Specifically, we incorporate camera adapters into the video model to enable accurate camera control. To enhance two key properties of 3D objects, geometry and appearance, we design a normal map generation branch and use normal map features to guide the synthesis of the target views via attention mechanism, thereby improving geometric consistency. Moreover, we apply a pixel-space supervision to alleviate blurry appearance caused by spatial compression in the latent space. Extensive experiments show that OrbitNVS significantly outperforms previous methods on the GSO and OmniObject3D benchmarks, especially in the challenging single-view setting (\eg, +2.9 dB and +2.4 dB PSNR).