R2RGEN: Real-to-Real 3D Data Generation for Spatially Generalized Manipulation

arXiv cs.RO / 4/30/2026

📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces R2RGen, a simulator- and rendering-free framework for generating real-to-real 3D data to support spatially generalized robotic manipulation.
  • It targets the sim-to-real gap and limitations of prior data-generation methods that often assume fixed bases or fixed camera viewpoints.
  • R2RGen uses a three-stage pipeline: parsing scene/trajectory from source demonstrations across camera setups, augmenting object and robot positions via group-wise backtracking, and performing camera-aware post-processing to match real 3D sensor distributions.
  • Experiments suggest R2RGen improves data efficiency and shows potential for scaling to and application in mobile manipulation scenarios.

Abstract

Towards the aim of generalized robotic manipulation, spatial generalization is the most fundamental capability that requires the policy to work robustly under different spatial distribution of objects, environment and agent itself. To achieve this, substantial human demonstrations need to be collected to cover different spatial configurations for training a generalized visuomotor policy via imitation learning. Prior works explore a promising direction that leverages data generation to acquire abundant spatially diverse data from minimal source demonstrations. However, most approaches face significant sim-to-real gap and are often limited to constrained settings, such as fixed-base scenarios and predefined camera viewpoints. In this paper, we propose a real-to-real 3D data generation framework (R2RGen) that directly augments the pointcloud observation-action pairs to generate real-world data. R2RGen is simulator- and rendering-free, thus being efficient and plug-and-play. Specifically, we propose a unified three-stage framework, which (1) pre-processes source demonstrations under different camera setups in a shared 3D space with scene / trajectory parsing; (2) augments objects and robot's position with a group-wise backtracking strategy; (3) aligns the distribution of generated data with real-world 3D sensor using camera-aware post-processing. Empirically, R2RGen substantially enhances data efficiency on extensive experiments and demonstrates strong potential for scaling and application on mobile manipulation.