Point Bridge: 3D Representations for Cross Domain Policy Learning

arXiv cs.RO / 3/26/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces Point Bridge, a framework for training robot manipulation agents using only synthetic simulation data to enable zero-shot sim-to-real policy transfer despite the visual domain gap.
  • Point Bridge uses domain-agnostic, point-based representations extracted automatically by Vision-Language Models (VLMs), avoiding the need for explicit visual or object-level alignment between sim and real.
  • It combines transformer-based policy learning with efficient inference-time pipelines to produce policies that can operate in real-world manipulation tasks.
  • Adding co-training with small sets of real demonstrations further improves results, with reported gains of up to 44% for zero-shot transfer and up to 66% when using limited real data across single-task and multitask settings.
  • The work is positioned as a step toward more data-efficient “robot foundation model” training by making synthetic data far more transferable to reality.

Abstract

Robot foundation models are beginning to deliver on the promise of generalist robotic agents, yet progress remains constrained by the scarcity of large-scale real-world manipulation datasets. Simulation and synthetic data generation offer a scalable alternative, but their usefulness is limited by the visual domain gap between simulation and reality. In this work, we present Point Bridge, a framework that leverages unified, domain-agnostic point-based representations to unlock synthetic datasets for zero-shot sim-to-real policy transfer, without explicit visual or object-level alignment. Point Bridge combines automated point-based representation extraction via Vision-Language Models (VLMs), transformer-based policy learning, and efficient inference-time pipelines to train capable real-world manipulation agents using only synthetic data. With additional co-training on small sets of real demonstrations, Point Bridge further improves performance, substantially outperforming prior vision-based sim-and-real co-training methods. It achieves up to 44% gains in zero-shot sim-to-real transfer and up to 66% with limited real data across both single-task and multitask settings. Videos of the robot are best viewed at: https://pointbridge3d.github.io/