Towards a Multi-Embodied Grasping Agent

arXiv cs.RO / 4/17/2026

💬 OpinionDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper proposes a multi-embodied grasping agent approach that aims to generalize across different gripper designs by leveraging shared geometric and kinematic structure rather than relying on implicit learning alone.
  • It introduces a data-efficient, flow-based, equivariant grasp synthesis architecture that can handle varying gripper types and degrees of freedom using only gripper and scene geometry.
  • The authors report an implementation refactor that re-creates all modules from scratch in JAX, enabling batching across scenes, grippers, and grasps for smoother training, better performance, and faster inference.
  • The accompanying dataset is large and diverse, covering grippers from humanoid hands to parallel yaw grippers with 25,000 scenes and 20 million grasps.

Abstract

Multi-embodiment grasping focuses on developing approaches that exhibit generalist behavior across diverse gripper designs. Existing methods often learn the kinematic structure of the robot implicitly and face challenges due to the difficulty of sourcing the required large-scale data. In this work, we present a data-efficient, flow-based, equivariant grasp synthesis architecture that can handle different gripper types with variable degrees of freedom and successfully exploit the underlying kinematic model, deducing all necessary information solely from the gripper and scene geometry. Unlike previous equivariant grasping methods, we translated all modules from the ground up to JAX and provide a model with batching capabilities over scenes, grippers, and grasps, resulting in smoother learning, improved performance and faster inference time. Our dataset encompasses grippers ranging from humanoid hands to parallel yaw grippers and includes 25,000 scenes and 20 million grasps.