TouchAnything: Diffusion-Guided 3D Reconstruction from Sparse Robot Touches

arXiv cs.CV / 4/13/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • TouchAnything is presented as a diffusion-guided framework for estimating accurate 3D object geometry using only sparse tactile contact measurements from robots, addressing limitations of vision under occlusion or poor lighting.
  • The method transfers geometric and semantic priors from a pretrained large-scale 2D vision diffusion model to the tactile domain, rather than training category-specific tactile reconstruction networks or diffusion models directly on tactile data.
  • Reconstruction is formulated as an optimization problem that enforces consistency with the sparse tactile constraints while steering solutions toward shapes that align with the diffusion prior.
  • The authors report improved reconstruction accuracy over existing baselines and claim the ability to perform open-world 3D reconstruction for previously unseen object instances based on a coarse class-level description.

Abstract

Accurate object geometry estimation is essential for many downstream tasks, including robotic manipulation and physical interaction. Although vision is the dominant modality for shape perception, it becomes unreliable under occlusions or challenging lighting conditions. In such scenarios, tactile sensing provides direct geometric information through physical contact. However, reconstructing global 3D geometry from sparse local touches alone is fundamentally underconstrained. We present TouchAnything, a framework that leverages a pretrained large-scale 2D vision diffusion model as a semantic and geometric prior for 3D reconstruction from sparse tactile measurements. Unlike prior work that trains category-specific reconstruction networks or learns diffusion models directly from tactile data, we transfer the geometric knowledge encoded in pretrained visual diffusion models to the tactile domain. Given sparse contact constraints and a coarse class-level description of the object, we formulate reconstruction as an optimization problem that enforces tactile consistency while guiding solutions toward shapes consistent with the diffusion prior. Our method reconstructs accurate geometries from only a few touches, outperforms existing baselines, and enables open-world 3D reconstruction of previously unseen object instances. Our project page is https://grange007.github.io/touchanything .