SpaCeFormer: Fast Proposal-Free Open-Vocabulary 3D Instance Segmentation
arXiv cs.CV / 4/23/2026
📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research
Key Points
- The paper introduces SpaCeFormer, a proposal-free approach to open-vocabulary 3D instance segmentation designed for robotics and AR/VR use cases.
- SpaCeFormer achieves real-time performance by running at 0.14 seconds per scene, addressing the latency bottleneck of slower multi-stage 2D+3D pipelines that can take hundreds of seconds per scene.
- The authors also release SpaCeFormer-3M, a large open-vocabulary 3D instance segmentation dataset with 3.0M multi-view-consistent captions covering 604K instances across 7.4K scenes, constructed via multi-view mask clustering and VLM captioning.
- The method uses spatial window attention plus Morton-curve serialization for coherent 3D features, and a RoPE-enhanced decoder that predicts instance masks directly from learned queries without external region proposals.
- Experiments show strong improvements, including 11.1 zero-shot mAP on ScanNet200 (2.8x over the prior best proposal-free method) and 22.9/24.1 mAP on ScanNet++ and Replica, surpassing prior methods even those using multi-view 2D inputs.
Related Articles

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to

Trajectory Forecasts in Unknown Environments Conditioned on Grid-Based Plans
Dev.to

Elevating Austria: Google invests in its first data center in the Alps.
Google Blog

10 AI Tools Every Developer Should Try in 2026
Dev.to

OpenAI Just Named It Workspace Agents. We Open-Sourced Our Lark Version Six Months Ago
Dev.to