TwinOR: Photorealistic Digital Twins of Dynamic Operating Rooms for Embodied AI Research

arXiv cs.RO / 4/17/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical UsageModels & Research

Key Points

  • TwinOR is introduced as a “real-to-sim” infrastructure that builds photorealistic, dynamic digital twins of operating rooms to support safe embodied AI research and continual evaluation.
  • The system reconstructs static OR geometry with centimeter-level accuracy while continuously modeling human and equipment motion, fusing both into an immersive 3D environment for controllable simulation.
  • TwinOR can generate sensor-realistic data by synthesizing stereo and monocular RGB streams as well as depth observations, enabling tasks like geometry understanding and visual localization.
  • In experiments, pretrained stereo and SLAM-related models (e.g., FoundationStereo and ORB-SLAM3) evaluated on TwinOR-synthesized data perform within their reported accuracy ranges on real-world indoor benchmarks.
  • By providing a perception-grounded pipeline for automatically constructing dynamic OR twins, TwinOR aims to bridge embodied intelligence training from simulation toward real clinical settings.

Abstract

Developing embodied AI for intelligent surgical systems requires safe, controllable environments for continual learning and evaluation. However, safety regulations and operational constraints in operating rooms (ORs) limit agents from freely perceiving and interacting in realistic settings. Digital twins provide high-fidelity, risk-free environments for exploration and training. How we may create dynamic digital representations of ORs that capture relevant spatial, visual, and behavioral complexity remains an open challenge. We introduce TwinOR, a real-to-sim infrastructure for constructing photorealistic and dynamic digital twins of ORs. The system reconstructs static geometry and continuously models human and equipment motion. The static and dynamic components are fused into an immersive 3D environment that supports controllable simulation and facilitates future embodied exploration. The proposed framework reconstructs complete OR geometry with centimeter-level accuracy while preserving dynamic interaction across surgical workflows. In our experiments, TwinOR synthesizes stereo and monocular RGB streams as well as depth observations for geometry understanding and visual localization tasks. Models such as FoundationStereo and ORB-SLAM3 evaluated on TwinOR-synthesized data achieve performance within their reported accuracy ranges on real-world indoor datasets, demonstrating that TwinOR provides sensor-level realism sufficient for emulating real-world perception and localization challenge. By establishing a perception-grounded real-to-sim pipeline, TwinOR enables the automatic construction of dynamic, photorealistic digital twins of ORs. As a safe and scalable environment for experimentation, TwinOR opens new opportunities for translating embodied intelligence from simulation to real-world clinical environments.