VisionNVS: Self-Supervised Inpainting for Novel View Synthesis under the Virtual-Shift Paradigm

arXiv cs.CV / 3/19/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

VisionNVS presents a camera-only framework for novel view synthesis in autonomous driving by reframing the task as self-supervised inpainting under a Virtual-Shift paradigm.
The Virtual-Shift strategy uses monocular depth proxies to simulate occlusion patterns and map them to the original view, enabling pixel-perfect supervision from raw images and reducing domain gaps.
The Pseudo-3D Seam Synthesis method aggregates data from adjacent cameras during training to model real-world photometric discrepancies and calibration errors for improved spatial consistency.
Experiments demonstrate that VisionNVS achieves superior geometric fidelity and visual quality compared with LiDAR-dependent baselines, supporting scalable driving simulation.

Abstract

A fundamental bottleneck in Novel View Synthesis (NVS) for autonomous driving is the inherent supervision gap on novel trajectories: models are tasked with synthesizing unseen views during inference, yet lack ground truth images for these shifted poses during training. In this paper, we propose VisionNVS, a camera-only framework that fundamentally reformulates view synthesis from an ill-posed extrapolation problem into a self-supervised inpainting task. By introducing a ``Virtual-Shift'' strategy, we use monocular depth proxies to simulate occlusion patterns and map them onto the original view. This paradigm shift allows the use of raw, recorded images as pixel-perfect supervision, effectively eliminating the domain gap inherent in previous approaches. Furthermore, we address spatial consistency through a Pseudo-3D Seam Synthesis strategy, which integrates visual data from adjacent cameras during training to explicitly model real-world photometric discrepancies and calibration errors. Experiments demonstrate that VisionNVS achieves superior geometric fidelity and visual quality compared to LiDAR-dependent baselines, offering a robust solution for scalable driving simulation.