SutureAgent: Learning Surgical Trajectories via Goal-conditioned Offline RL in Pixel Space
arXiv cs.AI / 3/31/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper presents SutureAgent, which predicts surgical needle trajectories from endoscopic video by reframing the problem as goal-conditioned sequential decision-making in pixel space.
- By modeling the needle tip as an agent that moves step-by-step in pixel coordinates, the method captures continuity between adjacent motion steps and enforces physically plausible state transitions over time.
- It leverages sparse waypoint annotations by converting them into denser supervisory signals using cubic spline interpolation to create reward structure that guides learning.
- The approach uses a variable-length clip observation encoder for both spatial and long-range temporal understanding, and predicts future waypoints autoregressively with discrete direction choices plus continuous magnitudes.
- Using Conservative Q-Learning with Behavioral Cloning regularization for stable offline optimization, SutureAgent is reported to reduce Average Displacement Error by 58.6% on a new kidney wound suturing dataset (1,158 trajectories from 50 patients) versus the strongest baseline.



