FAST3DIS: Feed-forward Anchored Scene Transformer for 3D Instance Segmentation
arXiv cs.CV / 3/30/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces FAST3DIS, an end-to-end feed-forward Transformer approach for 3D instance segmentation that avoids the common “lift-and-cluster” pipeline used by many prior feed-forward 3D reconstruction methods.
- FAST3DIS uses a 3D-anchored, query-based Transformer with a learned 3D anchor generator and anchor-sampling cross-attention to project object queries into multi-view feature maps for efficient, view-consistent instance prediction.
- The method retains zero-shot geometric priors from a depth backbone while adapting to learn instance-specific semantics directly rather than relying on non-differentiable clustering.
- It adds dual-level regularization combining multi-view contrastive learning with a dynamically scheduled spatial overlap penalty to prevent query collisions and improve boundary precision.
- Experiments on complex indoor 3D datasets show competitive segmentation accuracy with improved memory scalability and faster inference than clustering-based state-of-the-art methods.
Related Articles

Black Hat Asia
AI Business

Mr. Chatterbox is a (weak) Victorian-era ethically trained model you can run on your own computer
Simon Willison's Blog
Beyond the Chatbot: Engineering Multi-Agent Ecosystems in 2026
Dev.to

I missed the "fun" part in software development
Dev.to

The Billion Dollar Tax on AI Agents
Dev.to