Physically Grounded 3D Generative Reconstruction under Hand Occlusion using Proprioception and Multi-Contact Touch
arXiv cs.CV / 4/13/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper presents a multimodal, physically grounded 3D generative reconstruction method for metric-scale amodal object completion under severe hand occlusion using proprioception and multi-contact tactile signals.
- It represents the object as a pose-aware, camera-aligned signed distance field (SDF) and learns a compact structure latent via a Structure-VAE, then models distributions in that space with a conditional flow-matching diffusion model.
- Training uses a vision-only pretraining stage followed by finetuning on occluded manipulation scenes, conditioning on visible RGB evidence, occlusion/visibility masks, hand latent state, and tactile contact information.
- To improve physical plausibility, the method introduces physics-based objectives and differentiable decoder-guidance to reduce hand–object interpenetration and to better satisfy contact constraints.
- Experiments in simulation show substantial gains over vision-only baselines for occluded completion, and the approach is validated via transfer to a real humanoid robot.
Related Articles

Black Hat Asia
AI Business

Apple is building smart glasses without a display to serve as an AI wearable
THE DECODER

Why Fashion Trend Prediction Isn’t Enough Without Generative AI
Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to

Chatbot vs Voicebot: The Real Business Decision Nobody Talks About
Dev.to