CompassAD: Intent-Driven 3D Affordance Grounding in Functionally Competing Objects
arXiv cs.RO / 4/3/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces CompassAD, a new 3D affordance benchmarking setting focused on “confusable” multi-object scenes where multiple objects share the same affordance but only one fits the instruction context (e.g., choosing a knife over scissors for “cut the apple”).
- It formalizes Multi-Object Affordance Grounding under Intent-Driven Instructions, requiring a per-point affordance mask on the correct object within a cluttered point cloud, conditioned on implicit natural-language intent.
- The dataset covers 30 confusing object pairs across 16 affordance types, with 6,422 scenes and 88K+ query-answer pairs specifically designed for implicit intent rather than explicit category names.
- The proposed CompassNet uses two modules—Instance-bounded Cross Injection (to avoid language-geometry “leakage” across object boundaries) and Bi-level Contrastive Refinement (to sharpen discrimination at both object-group and point levels).
- Experiments show strong results on both seen and unseen queries, and real-robot deployment on a manipulator demonstrates effective transfer to real grasping in confusing multi-object scenes.
Related Articles

Black Hat Asia
AI Business

90000 Tech Workers Got Fired This Year and Everyone Is Blaming AI but Thats Not the Whole Story
Dev.to

Microsoft’s $10 Billion Japan Bet Shows the Next AI Battleground Is National Infrastructure
Dev.to

TII Releases Falcon Perception: A 0.6B-Parameter Early-Fusion Transformer for Open-Vocabulary Grounding and Segmentation from Natural Language Prompts
MarkTechPost

Portable eye scanner powered by AI expands access to low-cost community screening
Reddit r/artificial