TreeGaussian: Tree-Guided Cascaded Contrastive Learning for Hierarchical Consistent 3D Gaussian Scene Segmentation and Understanding

arXiv cs.CV / 4/7/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • TreeGaussian is a new research method for improving hierarchical and whole-part 3D semantic segmentation using 3D Gaussian Splatting by explicitly modeling object-part relationships via a multi-level object tree.
  • The approach uses a two-stage cascaded contrastive learning strategy that refines features progressively from global to local to reduce redundancy, mitigate contrastive saturation, and stabilize training.
  • It introduces a Consistent Segmentation Detection (CSD) mechanism and a graph-based denoising module to improve cross-view segmentation consistency and suppress unstable/low-quality Gaussian points.
  • Experiments, including open-vocabulary 3D object selection and point cloud understanding tasks, along with ablation studies, are reported to show improved effectiveness and robustness over prior approaches.
  • The core goal is to overcome limitations of prior dense pairwise comparisons and inconsistent hierarchical label signals derived from 2D priors that lead to suboptimal hierarchical feature learning.

Abstract

3D Gaussian Splatting (3DGS) has emerged as a real-time, differentiable representation for neural scene understanding. However, existing 3DGS-based methods struggle to represent hierarchical 3D semantic structures and capture whole-part relationships in complex scenes. Moreover, dense pairwise comparisons and inconsistent hierarchical labels from 2D priors hinder feature learning, resulting in suboptimal segmentation. To address these limitations, we introduce TreeGaussian, a tree-guided cascaded contrastive learning framework that explicitly models hierarchical semantic relationships and reduces redundancy in contrastive supervision. By constructing a multi-level object tree, TreeGaussian enables structured learning across object-part hierarchies. In addition, we propose a two-stage cascaded contrastive learning strategy that progressively refines feature representations from global to local, mitigating saturation and stabilizing training. A Consistent Segmentation Detection (CSD) mechanism and a graph-based denoising module are further introduced to align segmentation modes across views while suppressing unstable Gaussian points, enhancing segmentation consistency and quality. Extensive experiments, including open-vocabulary 3D object selection, 3D point cloud understanding, and ablation studies, demonstrate the effectiveness and robustness of our approach.

TreeGaussian: Tree-Guided Cascaded Contrastive Learning for Hierarchical Consistent 3D Gaussian Scene Segmentation and Understanding | AI Navigate