A Model-based Visual Contact Localization and Force Sensing System for Compliant Robotic Grippers

arXiv cs.CV / 5/4/2026

📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes a model-based visual contact localization and force sensing system to estimate grasp forces without damaging delicate objects using deformable robotic grippers.
  • Instead of relying on brittle end-to-end deep learning, the system combines wrist RGB-D visual keypoints with an inverse finite element analysis simulation to relate observed deformation to force.
  • An iterative contact localization module uses a deep learning-based online 3D reconstruction and pose estimation pipeline to update contact location and remain robust to visual occlusion and unseen objects.
  • Experiments with fin-ray-shaped soft grippers show strong accuracy, achieving 0.23 N RMSE (2.11% NRMSD) during loading and 0.48 N RMSE (4.34% NRMSD) across the full grasp process under varied conditions and objects.

Abstract

Grasp force estimation can help prevent robots from damaging delicate objects during manipulation and improve learning-based robotic control. Integrating force sensing into deformable grippers negotiates trade-offs in cost, complexity, mechanical robustness, and performance. With the growing integration of RGB-D wrist cameras into robotic systems for control purposes, camera-based techniques are a promising solution for indirect visual force estimation. Current approaches mostly utilize end-to-end deep learning, which can be brittle when generalizing to new scenarios, while existing model-based approaches are unsuited to grasping and modern grasper geometries. To address these challenges, we developed a model-based visual force sensing approach integrating an iterative contact localization with generalization to unseen objects. The system extracts structural key points from wrist camera RGB-D images of deforming fin-ray-shaped soft grippers, and uses these key points to define parameters of an inverse finite element analysis simulation in Simulation Open Framework Architecture. The iterative contact localization sub-system utilizes a deep learning-based online 3D reconstruction and pose estimation pipeline to dynamically update contact location, and is robust to visual occlusion and unseen objects. Our system demonstrated an average root mean square error of 0.23 N and normalized root mean square deviation of 2.11% during the load phase, and 0.48 N and 4.34% over the entire grasping process when interacting with different objects under various conditions, showcasing its potential for real-time model-based indirect force sensing of soft grippers.