AI Navigate

Patient4D: Temporally Consistent Patient Body Mesh Recovery from Monocular Operating Room Video

arXiv cs.CV / 3/19/2026

📰 NewsModels & Research

Key Points

  • Patient4D is a stationarity-constrained reconstruction pipeline that recovers dense 3D body meshes from monocular operating-room video by exploiting a stationarity prior to handle occlusion and changing viewpoints.
  • The method combines image-level foundation models for perception with lightweight geometric components, notably Pose Locking and Rigid Fallback, to enforce temporal consistency while remaining compatible with off-the-shelf HMR models.
  • Evaluation on 4,680 synthetic surgical sequences and three public HMR benchmarks shows mean IoU of 0.75 under drape occlusion and a reduction of failure frames from 30.5% to 1.3% compared with the best baseline.
  • The results indicate that leveraging stationarity priors can substantially improve monocular 3D reconstruction in clinical AR scenarios and may broaden the applicability of HMR in surgical settings.

Abstract

Recovering a dense 3D body mesh from monocular video remains challenging under occlusion from draping and continuously moving camera viewpoints. This configuration arises in surgical augmented reality (AR), where an anesthetized patient lies under surgical draping while a surgeon's head-mounted camera continuously changes viewpoint. Existing human mesh recovery (HMR) methods are typically trained on upright, moving subjects captured from relatively stable cameras, leading to performance degradation under such conditions. To address this, we present Patient4D, a stationarity-constrained reconstruction pipeline that explicitly exploits the stationarity prior. The pipeline combines image-level foundation models for perception with lightweight geometric mechanisms that enforce temporal consistency across frames. Two key components enable robust reconstruction: Pose Locking, which anchors pose parameters using stable keyframes, and Rigid Fallback, which recovers meshes under severe occlusion through silhouette-guided rigid alignment. Together, these mechanisms stabilize predictions while remaining compatible with off-the-shelf HMR models. We evaluate Patient4D on 4,680 synthetic surgical sequences and three public HMR video benchmarks. Under surgical drape occlusion, Patient4D achieves a 0.75 mean IoU, reducing failure frames from 30.5% to 1.3% compared to the best baseline. Our findings demonstrate that exploiting stationarity priors can substantially improve monocular reconstruction in clinical AR scenarios.