Geometrically Plausible Object Pose Refinement using Differentiable Simulation

arXiv cs.RO / 3/24/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses object pose estimation failures where predicted poses are geometrically infeasible, such as intersecting the robot hand or floating off support surfaces during dexterous manipulation.
  • It proposes a multi-modal pose refinement pipeline that uses differentiable physics simulation, differentiable rendering, and visuo-tactile sensing to enforce physical and spatial consistency.
  • Experiments indicate large reductions in intersection volume error versus ICP-based baselines, with reported decreases of 73% under accurate initialization and over 87% under high uncertainty.
  • The geometric plausibility gains are accompanied by improvements in both translation and orientation accuracy, suggesting the refinement balances physical constraints with sensor fidelity.

Abstract

State-of-the-art object pose estimation methods are prone to generating geometrically infeasible pose hypotheses. This problem is prevalent in dexterous manipulation, where estimated poses often intersect with the robotic hand or are not lying on a support surface. We propose a multi-modal pose refinement approach that combines differentiable physics simulation, differentiable rendering and visuo-tactile sensing to optimize object poses for both spatial accuracy and physical consistency. Simulated experiments show that our approach reduces the intersection volume error between the object and robotic hand by 73\% when the initial estimate is accurate and by over 87\% under high initial uncertainty, significantly outperforming standard ICP-based baselines. Furthermore, the improvement in geometric plausibility is accompanied by a concurrent reduction in translation and orientation errors. Achieving pose estimation that is grounded in physical reality while remaining faithful to multi-modal sensor inputs is a critical step toward robust in-hand manipulation.