Visual-Tactile Peg-in-Hole Assembly Learning from Peg-out-of-Hole Disassembly

arXiv cs.RO / 4/23/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

共有:

Key Points

The paper presents a visual-tactile learning framework for peg-in-hole (PiH) robotic assembly by using the inverse task peg-out-of-hole (PooH) disassembly to reduce exploration costs.
It models both PooH and PiH as POMDPs in a shared visual-tactile observation space, then trains a PooH policy and converts its trajectories into expert-like data for PiH via temporal reversal and action randomization.
During PiH execution, visual sensing is used to guide the peg-hole approach, while tactile feedback helps correct misalignment and improve contact interaction.
Experiments across various peg-hole geometries show the method reduces contact forces by 6.4% versus single-modality baselines and achieves 87.5% success on seen objects and 77.1% on unseen objects, outperforming direct RL training by 18.1% in success rate.
The authors provide demos, code, and datasets to support reproduction and further research at the linked project page.

Abstract

Peg-in-hole (PiH) assembly is a fundamental yet challenging robotic manipulation task. While reinforcement learning (RL) has shown promise in tackling such tasks, it requires extensive exploration. In this paper, we propose a novel visual-tactile skill learning framework for the PiH task that leverages its inverse task, i.e., peg-out-of-hole (PooH) disassembly, to facilitate PiH learning. Compared to PiH, PooH is inherently easier as it only needs to overcome existing friction without precise alignment, making data collection more efficient. To this end, we formulate both PooH and PiH as Partially Observable Markov Decision Processes (POMDPs) in a unified environment with shared visual-tactile observation space. A visual-tactile PooH policy is first trained; its trajectories, containing kinematic, visual and tactile information, are temporally reversed and action-randomized to provide expert data for PiH. In the policy learning, visual sensing facilitates the peg-hole approach, while tactile measurements compensate for peg-hole misalignment. Experiments across diverse peg-hole geometries show that the visual-tactile policy attains 6.4% lower contact forces than its single-modality counterparts, and that our framework achieves average success rates of 87.5% on seen objects and 77.1% on unseen objects, outperforming direct RL methods that train PiH policies from scratch by 18.1% in success rate. Demos, code, and datasets are available at https://sites.google.com/view/pooh2pih.