The First Healthcare Robotics Dataset and Foundational Physical AI Models for Healthcare Robotics

Hugging Face Blog / 3/17/2026

📰 NewsModels & Research

Read original →

共有:

Key Points

The article announces the release of the first healthcare robotics dataset along with foundational physical AI models tailored for healthcare robotics.
The dataset provides real-world robotic interaction data to support research and benchmarking in clinical robotics applications.
The foundational physical AI models are designed to enable more generalizable perception, planning, and control for healthcare robots across tasks and devices.
The release aims to promote open, reproducible research and foster collaboration between academia and industry in the healthcare robotics field.

Back to Articles

The First Healthcare Robotics Dataset and Foundational Physical AI Models for Healthcare Robotics

Enterprise + Article Published March 16, 2026

Upvote

nvidia

nvidia

nvidia

nvidia

Introducing Open-H-Embodiment: The first healthcare robotics open dataset, built by a community collaboration

Authors: Nigel Nelson, Lukas Zbinden, Mostafa Toloui, Sean Huver

Healthcare AI has mainly been perception-based, focusing on models that interpret signals and classify or segment pathology/anatomy. However, healthcare involves "doing," making the static, perception-only datasets of the past—which lack embodiment, contact dynamics, and closed-loop control—insufficient. The field needs standardized robot bodies, synchronized vision–force–kinematics data, sim-to-real pairing, and cross-embodiment benchmarks to build the foundation for Physical AI.

1. Open-H-Embodiment

Open-H-Embodiment is a community‑driven dataset initiative building the open, shared foundation needed to train and evaluate AI autonomy and world foundation models for surgical robotics and ultrasound. Started by a steering committee including Prof. Axel Krieger (Johns Hopkins), Prof. Nassir Navab (Technical University of Munich), and Dr. Mahdi Azizian (NVIDIA), the effort now spans 35 organizations.

Participants from around the world came together to build the first large scale dataset to advance the cause of physical AI in healthcare robotics.

Open-H-Embodiment Sample data

Participants

Balgrist, CMR Surgical, The Chinese University of Hong Kong, Great Bay University, Hong Kong Baptist University, Hamlyn, ImFusion, Johns Hopkins University, Leeds University, Mohamed bin Zayed University of Artificial Intelligence, Moon Surgical, NVIDIA, Northwell Health, Obuda University, The Hong Kong Polytechnic University, Qilu Hospital of Shandong University, Rob Surgical, Sanoscience, Surgical Data Science Collective, Semaphor Surgical, Stanford, Dresden University of Technology, Technical University of Munich, Tuodao, Turin, University of British Columbia, UC Berkeley, UC San Diego, University of Illinois Chicago, University of Tennessee, University of Texas, Vanderbilt, and Virtual Incision.

The Dataset

Comprises 778 hours of CC-BY-4.0 healthcare robotics training data, largely surgical robotics, but also ultrasound and colonoscopy autonomy data.
Spans simulation, benchtop exercises (e.g., suturing), and real clinical procedures.
Uses commercial robots (CMR Surgical, Rob Surgical, Tuodao) and research robots (dVRK, Franka, Kuka).
Released alongside two new, permissively open-source models post-trained on this data.

2. GR00T-H: Vision Language Action Model for Surgical Robotics

First is GR00T-H, a derivative of the Isaac GR00T N series of Vision-Language-Action (VLA) models. Trained on roughly 600 hours of Open-H-Embodiment data, GR00T-H is the first policy model for surgical robotics tasks.

Building on NVIDIA’s open-source ecosystem, Isaac GR00T-H leverages Cosmos Reason 2 2B as its Vision-Language Model (VLM) backbone.

Architectural Design Choices

Surgical robotics requires high precision, but specialized hardware (like cable-driven systems) makes imitation learning (IL) difficult. To handle this, GR00T-H uses four key design choices:

Unique Embodiment Projectors: A unique, learnable MLP maps each robot's specific kinematics to a shared, normalized action space.
State Dropout (100%): Proprioceptive input is dropped during inference to create a learned bias term for each system, yielding better real-world results.
Relative EEF Actions: Training uses a common relative End-Effector (EEF) action space to overcome kinematic inconsistencies.
Metadata in Task Prompts: Instrument names and control index mapping are injected directly into the VLM task prompt.

A prototype of GR00T-H has demonstrated the ability to execute a complete, end-to-end suture in the SutureBot benchmark, highlighting robust long-horizon dexterity.

GR00T-H performing end-to-end suturing.

3. Cosmos-H-Surgical-Simulator

Cosmos-H-Surgical-Simulator is a World Foundation Model (WFM) for action-conditioned surgical robotics. Traditional simulators fail due to real-world complexities like soft-tissue, reflections, blood, and smoke.

Key Capabilities

Overcoming the Sim-to-Real Gap: Fine-tuned from NVIDIA Cosmos Predict 2.5 2B, it generates physically plausible surgical video directly from kinematic actions.
Efficiency Gains: For 600 rollouts, it took only 40 minutes in simulation versus 2 days using real-world benchtop methods.
WFM as a Physics Simulator: Implicitly learns tissue deformation and tool interaction from data.
Synthetic Data Generation: Generates realistic synthetic video-action pairs to augment underrepresented datasets.

Fine-Tuning Details

The model was fine-tuned on the Open-H-Embodiment dataset (9 robot embodiments, 32 datasets) using 64x A100 GPUs for approximately 10,000 GPU-hours. It utilizes a unified 44-dimensional action space.

4. What is Next: Towards Reasoning For Surgical Robotics

The goal for version 2 of the Open-H-Embodiment effort is to move beyond perceptual control to reasoning-capable autonomy—a surgical robotics ChatGPT moment—where systems can explain, plan, and adapt across long procedures. This requires extending Open-H-Embodiment into reasoning-ready data with annotated task traces capturing intents, outcomes, and failure modes. This effort needs community engagement, and we invite you to get involved. Visit our Open-H Github Repo to help shape the future of healthcare robotics.

5. Get started today

Access the following resources to start working with the Open-H-embodiment dataset and models:

Open-H-Embodiment: HF Dataset / Github Repo
NVIDIA Isaac GR00T-H model: HF Model / GR00T-H Github Repo
NVIDIA Cosmos-H-Surgical-Simulator: HF Model / Github Repo
Cosmos Cookbook: Step-by-step workflows to build your own WFM for your embodiment
Explore on Hugging Face: Check out new open Cosmos models and datasets on Hugging Face and GitHub or try models on build.nvidia.com.

Community

EditPreview

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Comment

· Sign up or log in to comment

Upvote

Jeff Bezos reportedly wants $100 billion to buy and transform old manufacturing firms with AI

TechCrunch

[R] Weekly digest: arXiv AI security papers translated for practitioners -- Cascade (cross-stack CVE+Rowhammer attacks on compound AI), LAMLAD (dual-LLM adversarial ML, 97% evasion), OpenClaw (4 vuln classes in agent frameworks)

Reddit r/MachineLearning

My Experience with Qwen 3.5 35B

Reddit r/LocalLLaMA

Cursor’s new coding model Composer 2 is here: It beats Claude Opus 4.6 but still trails GPT-5.4

VentureBeat

Qwen 3.5 122B completely falls apart at ~ 100K context

Reddit r/LocalLLaMA

The First Healthcare Robotics Dataset and Foundational Physical AI Models for Healthcare Robotics

Key Points

The First Healthcare Robotics Dataset and Foundational Physical AI Models for Healthcare Robotics

Introducing Open-H-Embodiment: The first healthcare robotics open dataset, built by a community collaboration

1. Open-H-Embodiment

Participants

The Dataset

2. GR00T-H: Vision Language Action Model for Surgical Robotics

Architectural Design Choices

3. Cosmos-H-Surgical-Simulator

Key Capabilities

Fine-Tuning Details

4. What is Next: Towards Reasoning For Surgical Robotics

5. Get started today

Community

Related Articles

Jeff Bezos reportedly wants $100 billion to buy and transform old manufacturing firms with AI

[R] Weekly digest: arXiv AI security papers translated for practitioners -- Cascade (cross-stack CVE+Rowhammer attacks on compound AI), LAMLAD (dual-LLM adversarial ML, 97% evasion), OpenClaw (4 vuln classes in agent frameworks)

My Experience with Qwen 3.5 35B

Cursor’s new coding model Composer 2 is here: It beats Claude Opus 4.6 but still trails GPT-5.4

Qwen 3.5 122B completely falls apart at ~ 100K context

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer