Continual Hand-Eye Calibration for Open-world Robotic Manipulation

arXiv cs.CV / 4/20/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes a continual hand-eye calibration framework for open-world robotic manipulation that addresses catastrophic forgetting during adaptation to new, unseen scene changes.
  • It introduces a Spatial-Aware Replay Strategy (SARS) that builds a geometrically uniform replay buffer to ensure broad coverage of each scene’s pose space using maximally informative viewpoints.
  • It also presents Structure-Preserving Dual Distillation (SPDD), which splits localization knowledge into coarse scene layout and fine pose precision, distilling them separately to reduce forgetting at different levels.
  • In the continual learning process, SARS supplies representative rehearsal samples from prior scenes when a new scene is encountered, while SPDD preserves prior knowledge through structured distillation; the method then updates the replay buffer with selected samples from the new scene.
  • Experiments on multiple public datasets show the approach significantly improves “anti scene forgetting,” maintaining prior-scene accuracy while still adapting effectively to new scenes.

Abstract

Hand-eye calibration through visual localization is a critical capability for robotic manipulation in open-world environments. However, most deep learning-based calibration models suffer from catastrophic forgetting when adapting into unseen data amongst open-world scene changes, while simple rehearsal-based continual learning strategy cannot well mitigate this issue. To overcome this challenge, we propose a continual hand-eye calibration framework, enabling robots to adapt to sequentially encountered open-world manipulation scenes through spatially replay strategy and structure-preserving distillation. Specifically, a Spatial-Aware Replay Strategy (SARS) constructs a geometrically uniform replay buffer that ensures comprehensive coverage of each scene pose space, replacing redundant adjacent frames with maximally informative viewpoints. Meanwhile, a Structure-Preserving Dual Distillation (SPDD) is proposed to decompose localization knowledge into coarse scene layout and fine pose precision, and distills them separately to alleviate both types of forgetting during continual adaptation. As a new manipulation scene arrives, SARS provides geometrically representative replay samples from all prior scenes, and SPDD applies structured distillation on these samples to retain previously learned knowledge. After training on the new scene, SARS incorporates selected samples from the new scene into the replay buffer for future rehearsal, allowing the model to continuously accumulate multi-scene calibration capability. Experiments on multiple public datasets show significant anti scene forgetting performance, maintaining accuracy on past scenes while preserving adaptation to new scenes, confirming the effectiveness of the framework.