AI Navigate

The First Healthcare Robotics Dataset and Foundational Physical AI Models for Healthcare Robotics

Hugging Face Blog / 3/17/2026

📰 NewsModels & Research

Key Points

  • The article announces the release of the first healthcare robotics dataset along with foundational physical AI models tailored for healthcare robotics.
  • The dataset provides real-world robotic interaction data to support research and benchmarking in clinical robotics applications.
  • The foundational physical AI models are designed to enable more generalizable perception, planning, and control for healthcare robots across tasks and devices.
  • The release aims to promote open, reproducible research and foster collaboration between academia and industry in the healthcare robotics field.

The First Healthcare Robotics Dataset and Foundational Physical AI Models for Healthcare Robotics

Enterprise + Article Published March 16, 2026

Introducing Open-H-Embodiment: The first healthcare robotics open dataset, built by a community collaboration

Authors: Nigel Nelson, Lukas Zbinden, Mostafa Toloui, Sean Huver

Healthcare AI has mainly been perception-based, focusing on models that interpret signals and classify or segment pathology/anatomy. However, healthcare involves "doing," making the static, perception-only datasets of the past—which lack embodiment, contact dynamics, and closed-loop control—insufficient. The field needs standardized robot bodies, synchronized vision–force–kinematics data, sim-to-real pairing, and cross-embodiment benchmarks to build the foundation for Physical AI.

1. Open-H-Embodiment

Open-H-Embodiment is a community‑driven dataset initiative building the open, shared foundation needed to train and evaluate AI autonomy and world foundation models for surgical robotics and ultrasound. Started by a steering committee including Prof. Axel Krieger (Johns Hopkins), Prof. Nassir Navab (Technical University of Munich), and Dr. Mahdi Azizian (NVIDIA), the effort now spans 35 organizations.

Participants from around the world came together to build the first large scale dataset to advance the cause of physical AI in healthcare robotics.

open_h_sample Open-H-Embodiment Sample data

Participants

Balgrist, CMR Surgical, The Chinese University of Hong Kong, Great Bay University, Hong Kong Baptist University, Hamlyn, ImFusion, Johns Hopkins University, Leeds University, Mohamed bin Zayed University of Artificial Intelligence, Moon Surgical, NVIDIA, Northwell Health, Obuda University, The Hong Kong Polytechnic University, Qilu Hospital of Shandong University, Rob Surgical, Sanoscience, Surgical Data Science Collective, Semaphor Surgical, Stanford, Dresden University of Technology, Technical University of Munich, Tuodao, Turin, University of British Columbia, UC Berkeley, UC San Diego, University of Illinois Chicago, University of Tennessee, University of Texas, Vanderbilt, and Virtual Incision.

The Dataset

  • Comprises 778 hours of CC-BY-4.0 healthcare robotics training data, largely surgical robotics, but also ultrasound and colonoscopy autonomy data.
  • Spans simulation, benchtop exercises (e.g., suturing), and real clinical procedures.
  • Uses commercial robots (CMR Surgical, Rob Surgical, Tuodao) and research robots (dVRK, Franka, Kuka).
  • Released alongside two new, permissively open-source models post-trained on this data.

2. GR00T-H: Vision Language Action Model for Surgical Robotics

First is GR00T-H, a derivative of the Isaac GR00T N series of Vision-Language-Action (VLA) models. Trained on roughly 600 hours of Open-H-Embodiment data, GR00T-H is the first policy model for surgical robotics tasks.

Building on NVIDIA’s open-source ecosystem, Isaac GR00T-H leverages Cosmos Reason 2 2B as its Vision-Language Model (VLM) backbone.

pyramid

Architectural Design Choices

Surgical robotics requires high precision, but specialized hardware (like cable-driven systems) makes imitation learning (IL) difficult. To handle this, GR00T-H uses four key design choices:

  • Unique Embodiment Projectors: A unique, learnable MLP maps each robot's specific kinematics to a shared, normalized action space.
  • State Dropout (100%): Proprioceptive input is dropped during inference to create a learned bias term for each system, yielding better real-world results.
  • Relative EEF Actions: Training uses a common relative End-Effector (EEF) action space to overcome kinematic inconsistencies.
  • Metadata in Task Prompts: Instrument names and control index mapping are injected directly into the VLM task prompt.

A prototype of GR00T-H has demonstrated the ability to execute a complete, end-to-end suture in the SutureBot benchmark, highlighting robust long-horizon dexterity.

gr00t_suture GR00T-H performing end-to-end suturing.


3. Cosmos-H-Surgical-Simulator

Cosmos-H-Surgical-Simulator is a World Foundation Model (WFM) for action-conditioned surgical robotics. Traditional simulators fail due to real-world complexities like soft-tissue, reflections, blood, and smoke.

Key Capabilities

  • Overcoming the Sim-to-Real Gap: Fine-tuned from NVIDIA Cosmos Predict 2.5 2B, it generates physically plausible surgical video directly from kinematic actions.
  • Efficiency Gains: For 600 rollouts, it took only 40 minutes in simulation versus 2 days using real-world benchtop methods.
  • WFM as a Physics Simulator: Implicitly learns tissue deformation and tool interaction from data.
  • Synthetic Data Generation: Generates realistic synthetic video-action pairs to augment underrepresented datasets.

cosmos_h_surg_sim

Fine-Tuning Details

The model was fine-tuned on the Open-H-Embodiment dataset (9 robot embodiments, 32 datasets) using 64x A100 GPUs for approximately 10,000 GPU-hours. It utilizes a unified 44-dimensional action space.


4. What is Next: Towards Reasoning For Surgical Robotics

The goal for version 2 of the Open-H-Embodiment effort is to move beyond perceptual control to reasoning-capable autonomy—a surgical robotics ChatGPT moment—where systems can explain, plan, and adapt across long procedures. This requires extending Open-H-Embodiment into reasoning-ready data with annotated task traces capturing intents, outcomes, and failure modes. This effort needs community engagement, and we invite you to get involved. Visit our Open-H Github Repo to help shape the future of healthcare robotics.


5. Get started today

Access the following resources to start working with the Open-H-embodiment dataset and models:

Community

EditPreview
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Comment

· Sign up or log in to comment