A High-Fidelity Digital Twin for Robotic Manipulation Based on 3D Gaussian Splatting

arXiv cs.RO / 5/5/2026

💬 OpinionDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper proposes a practical framework to build high-fidelity, interactive digital twins for robot manipulation within minutes using sparse RGB inputs.
  • It uses 3D Gaussian Splatting (3DGS) as a unified scene representation to achieve fast and photorealistic reconstruction, addressing slow reconstruction and limited visual fidelity in prior work.
  • The method adds visibility-aware semantic fusion to improve 3D labeling accuracy, and introduces a filter-based geometry conversion approach to generate collision-ready models for planning.
  • The resulting models are integrated into a Unity–ROS2–MoveIt physics engine, enabling closed-loop motion planning and more reliable real-world robot execution.
  • Experiments with a Franka Emika Panda pick-and-place setup show that the improved geometric accuracy improves robustness in real-world trials, supporting sim-to-real transfer in unstructured environments.

Abstract

Developing high-fidelity, interactive digital twins is crucial for enabling closed-loop motion planning and reliable real-world robot execution, which are essential to advancing sim-to-real transfer. However, existing approaches often suffer from slow reconstruction, limited visual fidelity, and difficulties in converting photorealistic models into planning-ready collision geometry. We present a practical framework that constructs high-quality digital twins within minutes from sparse RGB inputs. Our system employs 3D Gaussian Splatting (3DGS) for fast, photorealistic reconstruction as a unified scene representation. We enhance 3DGS with visibility-aware semantic fusion for accurate 3D labelling and introduce an efficient, filter-based geometry conversion method to produce collision-ready models seamlessly integrated with a Unity-ROS2-MoveIt physics engine. In experiments with a Franka Emika Panda robot performing pick-and-place tasks, we demonstrate that this enhanced geometric accuracy effectively supports robust manipulation in real-world trials. These results demonstrate that 3DGS-based digital twins, enriched with semantic and geometric consistency, offer a fast, reliable, and scalable path from perception to manipulation in unstructured environments.