InpaintSLat: Inpainting Structured 3D Latents via Initial Noise Optimization

arXiv cs.CV / 5/4/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes InpaintSLat, a training-free method for controllable 3D inpainting that relies on optimizing the initial noise rather than retraining or heavily modifying the diffusion process.
  • It argues that in structured 3D latent diffusion, the scene’s geometric structure forms early and is highly sensitive to the initial noise, which can lead to instability during inpainting/editing.
  • InpaintSLat improves fidelity by updating the initial noise using a backpropagation approximation derived from the rectified flow model, together with spectral parameterization for stable and efficient optimization.
  • Experiments show that the method consistently improves contextual consistency and prompt alignment compared with representative training-free inpainting baselines, and treats initial-noise control as a distinct, orthogonal control lever for 3D inpainting.

Abstract

We present a training-free approach for controllable 3D inpainting based on initial noise optimization. In the structured 3D latent diffusion framework, we observe that the underlying geometric structure is established during the early stages of the diffusion process and exhibits high sensitivity to the initial noise. Such characteristics compromise stability in tasks like inpainting and editing, where the model must ensure strict alignment with the existing context while synthesizing a new structure. In this paper, we introduce a strategy to optimize the initial noise within the structured 3D latent diffusion framework, ensuring high-fidelity 3D inpainting. Specifically, we update the initial noise by leveraging a backpropagation approximation grounded in the rectified flow model, with the spectral parameterization specially designed for robust and efficient structured 3D latent optimization. Experiments demonstrate consistent improvements in contextual consistency and prompt alignment over representative training-free inpainting baselines, establishing initial noise control as an independent dimension for 3D inpainting, orthogonal to conventional sampling trajectory manipulation.