PinPoint: Monocular Needle Pose Estimation for Robotic Suturing via Stein Variational Newton and Geometric Residuals

arXiv cs.RO / 3/25/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces PinPoint, a probabilistic monocular needle pose estimation framework for autonomous robotic suturing that explicitly represents depth ambiguity and rotational symmetry as a multimodal distribution of pose hypotheses.
  • PinPoint uses variational inference with a Stein Variational Newton procedure, combining analytical geometric likelihoods with closed-form Jacobians and robot-grasp constraints to efficiently guide particle hypotheses toward high-probability regions while preserving diversity.
  • Experimental results on real needle-tracking sequences show large error reductions versus a particle-filter baseline, including an 80% decrease in mean translational error and a 78% decrease in rotational error, alongside substantially better-calibrated uncertainty.
  • On induced-rotation sequences that intensify monocular ambiguity, PinPoint preserves a bimodal posterior 84% of the time—nearly triple the baseline—avoiding premature mode collapse.
  • Ex vivo suturing tests demonstrate stable needle tracking through intermittent occlusion and even full embedding, with low average translation and rotation errors during occlusion.

Abstract

Reliable estimation of surgical needle 3D position and orientation is essential for autonomous robotic suturing, yet existing methods operate almost exclusively under stereoscopic vision. In monocular endoscopic settings, common in transendoscopic and intraluminal procedures, depth ambiguity and rotational symmetry render needle pose estimation inherently ill-posed, producing a multimodal distribution over feasible configurations, rather than a single, well-grounded estimate. We present PinPoint, a probabilistic variational inference framework that treats this ambiguity directly, maintaining a distribution of pose hypotheses rather than suppressing it. PinPoint combines monocular image observations with robot-grasp constraints through analytical geometric likelihoods with closed-form Jacobians. This framework enables efficient Gauss-Newton preconditioning in a Stein Variational Newton inference, where second-order particle transport deterministically moves particles toward high-probability regions while kernel-based repulsion preserves diversity in the multimodal structure. On real needle-tracking sequences, PinPoint reduces mean translational error by 80% (down to 1.00 mm) and rotational error by 78% (down to 13.80{\deg}) relative to a particle-filter baseline, with substantially better-calibrated uncertainty. On induced-rotation sequences, where monocular ambiguity is most severe, PinPoint maintains a bimodal posterior 84% of the time, almost three times the rate of the particle filter baseline, correctly preserving the alternative hypothesis rather than committing prematurely to one mode. Suturing experiments in ex vivo tissue demonstrate stable tracking through intermittent occlusion, with average errors during occlusion of 1.34 mm in translation and 19.18{\deg} in rotation, even when the needle is fully embedded.