GAPG: Geometry Aware Push-Grasping Synergy for Goal-Oriented Manipulation in Clutter

arXiv cs.RO / 3/24/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces a geometry-aware push-grasp synergy framework for goal-oriented robotic manipulation in cluttered scenes where grasping alone often fails due to stacking and occlusion.
  • It uses point cloud data to jointly evaluate grasp feasibility/stability and predict how candidate pushing actions will change future graspable space.
  • A grasp evaluation module checks geometric relationships between the gripper point cloud and points in its closing region to assess whether a grasp is robust.
  • A push evaluation module estimates the effect of pushing on creating safe, effective grasp opportunities, improving both stability and efficiency over prior methods.
  • Experiments in both simulation and real-world settings show the approach generalizes well to real scenes and unseen objects.

Abstract

Grasping target objects is a fundamental skill for robotic manipulation, but in cluttered environments with stacked or occluded objects, a single-step grasp is often insufficient. To address this, previous work has introduced pushing as an auxiliary action to create graspable space. However, these methods often struggle with both stability and efficiency because they neglect the scene's geometric information, which is essential for evaluating grasp robustness and ensuring that pushing actions are safe and effective. To this end, we propose a geometry-aware push-grasp synergy framework that leverages point cloud data to integrate grasp and push evaluation. Specifically, the grasp evaluation module analyzes the geometric relationship between the gripper's point cloud and the points enclosed within its closing region to determine grasp feasibility and stability. Guided by this, the push evaluation module predicts how pushing actions influence future graspable space, enabling the robot to select actions that reliably transform non-graspable states into graspable ones. By jointly reasoning about geometry in both grasping and pushing, our framework achieves safer, more efficient, and more reliable manipulation in cluttered settings. Our method is extensively tested in simulation and real-world environments in various scenarios. Experimental results demonstrate that our model generalizes well to real-world scenes and unseen objects.