TransSplat: Unbalanced Semantic Transport for Language-Driven 3DGS Editing

arXiv cs.CV / 4/22/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses a core limitation of language-driven 3D Gaussian Splatting (3DGS) editing: prior pipelines improve view consistency but do not explicitly model semantic correspondence between edited 2D evidence and 3D Gaussians.
  • It introduces TransSplat, which casts language-driven 3DGS editing as a “multi-view unbalanced semantic transport” problem using correspondences between visible Gaussians and view-specific editing prototypes.
  • TransSplat recovers a cross-view shared canonical 3D edit field to enable unified updates of the 3D appearance across different views.
  • To prevent unintended changes, it uses transport residuals to suppress erroneous edits in non-target regions, reducing edit leakage and improving local control accuracy.
  • Experiments indicate that, relative to methods focused on view-consistency improvements, TransSplat yields better local editing accuracy and structural consistency.

Abstract

Language-driven 3D Gaussian Splatting (3DGS) editing provides a more convenient approach for modifying complex scenes in VR/AR. Standard pipelines typically adopt a two-stage strategy: first editing multiple 2D views, and then optimizing the 3D representation to match these edited observations. Existing methods mainly improve view consistency through multi-view feature fusion, attention filtering, or iterative recalibration. However, they fail to explicitly address a more fundamental issue: the semantic correspondence between edited 2D evidence and 3D Gaussians. To tackle this problem, we propose TransSplat, which formulates language-driven 3DGS editing as a multi-view unbalanced semantic transport problem. Specifically, our method establishes correspondences between visible Gaussians and view-specific editing prototypes, thereby explicitly characterizing the semantic relationship between edited 2D evidence and 3D Gaussians. It further recovers a cross-view shared canonical 3D edit field to guide unified 3D appearance updates. In addition, we use transport residuals to suppress erroneous edits in non-target regions, mitigating edit leakage and improving local control precision. Qualitative and quantitative results show that, compared with existing 3D editing methods centered on enhancing view consistency, TransSplat achieves superior performance in local editing accuracy and structural consistency.