TouchGuide: Inference-Time Steering of Visuomotor Policies via Touch Guidance

arXiv cs.RO / 5/1/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

Key Points

  • TouchGuide is a two-stage visuomotor approach that uses tactile feedback at inference time to improve contact-rich robotic manipulation.
  • It first generates a coarse, visually plausible action with a pre-trained diffusion or flow-matching policy, then refines that action using a task-specific Contact Physical Model (CPM) guided by touch.
  • The CPM is trained via contrastive learning on limited expert demonstrations and provides a tactile-informed feasibility score to steer sampling toward actions that satisfy realistic physical contact constraints.
  • To collect high-quality tactile training data affordably, the paper introduces TacUMI, which uses rigid fingertips to capture direct tactile signals.
  • Experiments across five demanding tasks (e.g., shoe lacing and chip handover) show TouchGuide significantly outperforms existing state-of-the-art visuо-tactile policies.

Abstract

Fine-grained and contact-rich manipulation remain challenging for robots, largely due to the underutilization of tactile feedback. To address this, we introduce TouchGuide, a novel cross-policy visuo-tactile fusion paradigm that fuses modalities within a low-dimensional action space. Specifically, TouchGuide operates in two stages to guide a pre-trained diffusion or flow-matching visuomotor policy at inference time. First, the policy produces a coarse, visually-plausible action using only visual inputs during early sampling. Second, a task-specific Contact Physical Model (CPM) provides tactile guidance to steer and refine the action, ensuring it aligns with realistic physical contact conditions. Trained through contrastive learning on limited expert demonstrations, the CPM provides a tactile-informed feasibility score to steer the sampling process toward refined actions that satisfy physical contact constraints. Furthermore, to facilitate TouchGuide training with high-quality and cost-effective data, we introduce TacUMI, a data collection system. TacUMI achieves a favorable trade-off between precision and affordability; by leveraging rigid fingertips, it obtains direct tactile feedback, thereby enabling the collection of reliable tactile data. Extensive experiments on five challenging contact-rich tasks, such as shoe lacing and chip handover, show that TouchGuide consistently and significantly outperforms state-of-the-art visuo-tactile policies.