RAAP：画像間アクションアライメントによる取得拡張アフォーダンス予測

arXiv cs.RO / 2026/4/1

📰 ニュースSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

要点

本論文は、非構造化環境におけるきめ細かなロボット操作を支援するための、物体アフォーダンスを予測する取得拡張（retrieval-augmented）フレームワークRAAPを提案する。
RAAPは、静的な接触位置の特定と動的なアクション方向を分離することで頑健性を高める。密な対応（dense correspondence）を用いて接触点を転送し、取得拡張アライメントモデルによりアクション方向を予測する。
本モデルは、複数の取得参照（retrieved references）を統合するためにデュアル重み付き注意（dual-weighted attention）を用い、疎または不完全な取得カバレッジによる失敗を減らすことを狙う。
実験では、DROIDおよびHOI4Dのコンパクトなサブセットに対して、タスクごとに数十サンプルという少数でRAAPを訓練し、未見の物体やカテゴリに対して一貫した汎化性能が得られることを示す。
著者らは、シミュレーションおよび実世界の両環境におけるゼロショットのロボット操作結果を報告し、参照と再現性のためのプロジェクトWebサイトを提供している。

Abstract

Understanding object affordances is essential for enabling robots to perform purposeful and fine-grained interactions in diverse and unstructured environments. However, existing approaches either rely on retrieval, which is fragile due to sparsity and coverage gaps, or on large-scale models, which frequently mislocalize contact points and mispredict post-contact actions when applied to unseen categories, thereby hindering robust generalization. We introduce Retrieval-Augmented Affordance Prediction (RAAP), a framework that unifies affordance retrieval with alignment-based learning. By decoupling static contact localization and dynamic action direction, RAAP transfers contact points via dense correspondence and predicts action directions through a retrieval-augmented alignment model that consolidates multiple references with dual-weighted attention. Trained on compact subsets of DROID and HOI4D with as few as tens of samples per task, RAAP achieves consistent performance across unseen objects and categories, and enables zero-shot robotic manipulation in both simulation and the real world. Project website: https://github.com/SEU-VIPGroup/RAAP.