Exploring Pose-Guided Imitation Learning for Robotic Precise Insertion

arXiv cs.RO / 3/25/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper addresses why precise robotic insertion is hard in real environments, citing contact-rich dynamics, tight clearances, and limited demonstration data as key bottlenecks for existing visuomotor imitation learning approaches.
It introduces pose-guided imitation learning that uses compact, object-centric relative SE(3) poses and applies a diffusion policy to predict future relative pose trajectories as actions for insertion.
To handle pose estimation noise, the method augments pose features with goal-conditioned RGBD encoding and uses a pose-guided residual gated fusion module that lets RGBD cues compensate when pose estimates are unreliable.
Experiments on six real-robot precise insertion tasks show strong performance with only 7–10 demonstrations per task, including successful operation at clearances as low as 0.01 mm, with better data efficiency and generalization than baselines.
The authors report that code will be released via the provided GitHub link, supporting reproducibility and further research on pose- and diffusion-based insertion policies.

Abstract

Imitation learning is promising for robotic manipulation, but \emph{precise insertion} in the real world remains difficult due to contact-rich dynamics, tight clearances, and limited demonstrations. Many existing visuomotor policies depend on high-dimensional RGB/point-cloud observations, which can be data-inefficient and generalize poorly under pose variations. In this paper, we study pose-guided imitation learning by using object poses in

\mathrm{SE}(3)

as compact, object-centric observations for precise insertion tasks. First, we propose a diffusion policy for precise insertion that observes the \emph{relative}

\mathrm{SE}(3)

pose of the source object with respect to the target object and predicts a future relative pose trajectory as its action. Second, to improve robustness to pose estimation noise, we augment the pose-guided policy with RGBD cues. Specifically, we introduce a goal-conditioned RGBD encoder to capture the discrepancy between current and goal observations. We further propose a pose-guided residual gated fusion module, where pose features provide the primary control signal and RGBD features adaptively compensate when pose estimates are unreliable. We evaluate our methods on six real-robot precise insertion tasks and achieve high performance with only

7

10

demonstrations per task. In our setup, the proposed policies succeed on tasks with clearances down to

0.01

~mm and demonstrate improved data efficiency and generalization over existing baselines. Code will be available at https://github.com/sunhan1997/PoseInsert.

The Security Gap in MCP Tool Servers (And What I Built to Fix It)

Dev.to

Adversarial AI framework reveals mechanisms behind impaired consciousness and a potential therapy

Reddit r/artificial

Why I Switched From GPT-4 to Small Language Models for Two of My Products

Dev.to

Orchestrating AI Velocity: Building a Decoupled Control Plane for Agentic Development

Dev.to

In the Kadrey v. Meta Platforms case, Judge Chabbria's quest to bust the fair use copyright defense to generative AI training rises from the dead!

Reddit r/artificial

Exploring Pose-Guided Imitation Learning for Robotic Precise Insertion

Key Points

Abstract

Related Articles

The Security Gap in MCP Tool Servers (And What I Built to Fix It)

Adversarial AI framework reveals mechanisms behind impaired consciousness and a potential therapy

Why I Switched From GPT-4 to Small Language Models for Two of My Products

Orchestrating AI Velocity: Building a Decoupled Control Plane for Agentic Development

In the Kadrey v. Meta Platforms case, Judge Chabbria's quest to bust the fair use copyright defense to generative AI training rises from the dead!

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer