Warm-Started Reinforcement Learning for Iterative 3D/2D Liver Registration

arXiv cs.CV / 4/14/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

CT-to-laparoscopic video registration for AR-guided minimally invasive surgery is difficult for supervised learning because coarse alignments often need slower optimization-based refinement.
The paper proposes a discrete-action reinforcement learning framework that treats registration as sequential decision-making, learning 6-DoF rigid transformation updates and an explicit stopping policy.
A key design is “warm-starting” the RL feature encoder from a supervised pose estimation network to stabilize geometric features and speed up convergence.
On a public laparoscopic dataset, the method reports an average target registration error (TRE) of 15.70 mm, comparable to supervised methods that require optimization, while converging faster.
The approach aims to enable automated iterative registration with less manual tuning (step sizes and stopping criteria) and is positioned as a foundation for future continuous-action and deformable registration work.

Abstract

Registration between preoperative CT and intraoperative laparoscopic video plays a crucial role in augmented reality (AR) guidance for minimally invasive surgery. Learning-based methods have recently achieved registration errors comparable to optimization-based approaches while offering faster inference. However, many supervised methods produce coarse alignments that rely on additional optimization-based refinement, thereby increasing inference time. We present a discrete-action reinforcement learning (RL) framework that formulates CT-to-video registration as a sequential decision-making process. A shared feature encoder, warm-started from a supervised pose estimation network to provide stable geometric features and faster convergence, extracts representations from CT renderings and laparoscopic frames, while an RL policy head learns to choose rigid transformations along six degrees of freedom and to decide when to stop the iteration. Experiments on a public laparoscopic dataset demonstrated that our method achieved an average target registration error (TRE) of 15.70 mm, comparable to supervised approaches with optimization, while achieving faster convergence. The proposed RL-based formulation enables automated, efficient iterative registration without manually tuned step sizes or stopping criteria. This discrete framework provides a practical foundation for future continuous-action and deformable registration models in surgical AR applications.