SuperGrasp: Single-View Object Grasping via Superquadric Similarity Matching, Evaluation, and Refinement

arXiv cs.RO / 4/1/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces SuperGrasp, a two-stage framework for robotic single-view object grasping using parallel-jaw grippers that separates initial grasp generation from later evaluation/refinement.
  • It proposes a Similarity Matching Module that retrieves grasp candidates by matching an input single-view point cloud to a precomputed primitive dataset using superquadric coefficients.
  • For refinement, it presents E-RNet, an end-to-end network that enlarges the grasp-aware region and uses the initial grasp closure region as a local anchor to improve stability and validity.
  • The authors build a primitive dataset (1.5k primitives) and a large training dataset (100k stable grasp labels across 124 objects) to improve generalization.
  • Experiments in both simulation and real-world settings show stable grasping and strong generalization to new scenes and novel objects.

Abstract

Robotic grasping from single-view observations remains a critical challenge in manipulation. Existing methods still struggle to generate stable and valid grasp poses when confronted with incomplete geometric information. To address these limitations, we propose SuperGrasp, a novel two-stage framework for single-view grasping with parallel-jaw grippers that decomposes the grasping process into initial grasp pose generation and subsequent grasp evaluation and refinement. In the first stage, we introduce a Similarity Matching Module that efficiently retrieves grasp candidates by matching the input single-view point cloud with a pre-computed primitive dataset based on superquadric coefficients. In the second stage, we propose E-RNet, an end-to-end network that expands the graspaware region and takes the initial grasp closure region as a local anchor region, enabling more accurate and reliable evaluation and refinement of grasp candidates. To enhance generalization, we construct a primitive dataset containing 1.5k primitives for similarity matching and collect a large-scale point cloud dataset with 100k stable grasp labels from 124 objects for network training. Extensive experiments in both simulation and realworld environments demonstrate that our method achieves stable grasping performance and strong generalization across varying scenes and novel objects.

SuperGrasp: Single-View Object Grasping via Superquadric Similarity Matching, Evaluation, and Refinement | AI Navigate