FluSplat: Sparse-View 3D Editing without Test-Time Optimization

arXiv cs.CV / 4/23/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

The FluSplat paper introduces a feed-forward approach for cross-view consistent 3D scene editing starting from sparse views.
Instead of running costly test-time optimization that alternates between 2D diffusion editing and 3D reconstruction, it uses cross-view regularization in the image domain during training.
Multi-view edits are jointly supervised with geometric alignment constraints so the method can produce view-consistent results without per-scene inference-time refinement.
Edited views are then lifted into 3D using a feed-forward 3D Gaussian Splatting (3DGS) model in a single forward pass, yielding a coherent 3DGS representation.
Experiments report competitive editing quality, significantly better cross-view consistency than optimization-based pipelines, and inference time reductions by orders of magnitude.

Abstract

Recent advances in text-guided image editing and 3D Gaussian Splatting (3DGS) have enabled high-quality 3D scene manipulation. However, existing pipelines rely on iterative edit-and-fit optimization at test time, alternating between 2D diffusion editing and 3D reconstruction. This process is computationally expensive, scene-specific, and prone to cross-view inconsistencies. We propose a feed-forward framework for cross-view consistent 3D scene editing from sparse views. Instead of enforcing consistency through iterative 3D refinement, we introduce a cross-view regularization scheme in the image domain during training. By jointly supervising multi-view edits with geometric alignment constraints, our model produces view-consistent results without per-scene optimization at inference. The edited views are then lifted into 3D via a feedforward 3DGS model, yielding a coherent 3DGS representation in a single forward pass. Experiments demonstrate competitive editing fidelity and substantially improved cross-view consistency compared to optimization-based methods, while reducing inference time by orders of magnitude.