CompassAD: Intent-Driven 3D Affordance Grounding in Functionally Competing Objects

arXiv cs.RO / 4/3/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces CompassAD, a new 3D affordance benchmarking setting focused on “confusable” multi-object scenes where multiple objects share the same affordance but only one fits the instruction context (e.g., choosing a knife over scissors for “cut the apple”).
It formalizes Multi-Object Affordance Grounding under Intent-Driven Instructions, requiring a per-point affordance mask on the correct object within a cluttered point cloud, conditioned on implicit natural-language intent.
The dataset covers 30 confusing object pairs across 16 affordance types, with 6,422 scenes and 88K+ query-answer pairs specifically designed for implicit intent rather than explicit category names.
The proposed CompassNet uses two modules—Instance-bounded Cross Injection (to avoid language-geometry “leakage” across object boundaries) and Bi-level Contrastive Refinement (to sharpen discrimination at both object-group and point levels).
Experiments show strong results on both seen and unseen queries, and real-robot deployment on a manipulator demonstrates effective transfer to real grasping in confusing multi-object scenes.

Abstract

When told to "cut the apple," a robot must choose the knife over nearby scissors, despite both objects affording the same cutting function. In real-world scenes, multiple objects may share identical affordances, yet only one is appropriate under the given task context. We call such cases confusing pairs. However, existing 3D affordance methods largely sidestep this challenge by evaluating isolated single objects, often with explicit category names provided in the query. We formalize Multi-Object Affordance Grounding under Intent-Driven Instructions, a new 3D affordance setting that requires predicting a per-point affordance mask on the correct object within a cluttered multi-object point cloud, conditioned on implicit natural language intent. To study this problem, we construct CompassAD, the first benchmark centered on implicit intent in confusable multi-object scenes. It comprises 30 confusing object pairs spanning 16 affordance types, 6,422 scenes, and 88K+ query-answer pairs. Furthermore, we propose CompassNet, a framework that incorporates two dedicated modules tailored to this task. Instance-bounded Cross Injection (ICI) constrains language-geometry alignment within object boundaries to prevent cross-object semantic leakage. Bi-level Contrastive Refinement (BCR) enforces discrimination at both geometric-group and point levels, sharpening distinctions between target and confusable surfaces. Extensive experiments demonstrate state-of-the-art results on both seen and unseen queries, and deployment on a robotic manipulator confirms effective transfer to real-world grasping in confusing multi-object scenes.

Black Hat Asia

AI Business

90000 Tech Workers Got Fired This Year and Everyone Is Blaming AI but Thats Not the Whole Story

Dev.to

Microsoft’s $10 Billion Japan Bet Shows the Next AI Battleground Is National Infrastructure

Dev.to

TII Releases Falcon Perception: A 0.6B-Parameter Early-Fusion Transformer for Open-Vocabulary Grounding and Segmentation from Natural Language Prompts

MarkTechPost

Portable eye scanner powered by AI expands access to low-cost community screening

Reddit r/artificial

CompassAD: Intent-Driven 3D Affordance Grounding in Functionally Competing Objects

Key Points

Abstract

Related Articles

Black Hat Asia

90000 Tech Workers Got Fired This Year and Everyone Is Blaming AI but Thats Not the Whole Story

Microsoft’s $10 Billion Japan Bet Shows the Next AI Battleground Is National Infrastructure

TII Releases Falcon Perception: A 0.6B-Parameter Early-Fusion Transformer for Open-Vocabulary Grounding and Segmentation from Natural Language Prompts

Portable eye scanner powered by AI expands access to low-cost community screening

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer