Affordance-R1: Reinforcement Learning for Generalizable Affordance Reasoning in Multimodal Large Language Model

arXiv cs.RO / 4/28/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

The paper introduces Affordance-R1, a multimodal reinforcement-learning framework for affordance grounding that predicts action-relevant object regions for robots.
It argues that prior affordance models struggle with out-of-domain generalization because they lack Chain-of-Thought (CoT)-style reasoning, and it addresses this with a CoT-guided GRPO (Group Relative Policy Optimization) approach.
The method uses a structured affordance function with separate format, perception, and cognition rewards to steer RL optimization, and it is trained end-to-end without relying on explicit reasoning data.
The authors build an affordance-focused training dataset (ReasonAff) and report strong zero-shot generalization, open-world generalization, and emergent test-time reasoning behavior.
Code and the dataset are released on GitHub, enabling others to reproduce and extend the approach.

Abstract

Affordance grounding focuses on predicting the specific regions of objects that are associated with the actions to be performed by robots. It plays a vital role in the fields of human-robot interaction, human-object interaction, embodied manipulation, and embodied perception. Existing models often neglect the affordance shared among different objects because they lack the Chain-of-Thought(CoT) reasoning abilities, limiting their out-of-domain (OOD) generalization and explicit reasoning capabilities. To address these challenges, we propose Affordance-R1, the first unified affordance grounding framework that integrates cognitive CoT guided Group Relative Policy Optimization (GRPO) within a reinforcement learning paradigm. Specifically, we designed a sophisticated affordance function, which contains format, perception, and cognition rewards to effectively guide optimization directions. Furthermore, we constructed a high-quality affordance-centric reasoning dataset, ReasonAff, to support training. Trained exclusively via reinforcement learning with GRPO and without explicit reasoning data, Affordance-R1 achieves robust zero-shot generalization and exhibits emergent test-time reasoning capabilities. Comprehensive experiments demonstrate that our model outperforms well-established methods and exhibits open-world generalization. To the best of our knowledge, Affordance-R1 is the first to integrate GRPO-based RL with reasoning into affordance reasoning. The code of our method and our dataset is released on https://github.com/hq-King/Affordance-R1.

how to use skills from Claude Code A.K.A Claudinho.

Dev.to

Behind the Scenes of a Self-Evolving AI: The Architecture of Tian AI

Dev.to

Meet Tian AI: Your Completely Offline AI Assistant for Android

Dev.to

UK to develop AI hardware plan

Tech.eu

Copilot Cowork | The Control Plane for Long-Running AI Work | A Rahsi Framework™

Dev.to

Affordance-R1: Reinforcement Learning for Generalizable Affordance Reasoning in Multimodal Large Language Model

Key Points

Abstract

Related Articles

how to use skills from Claude Code A.K.A Claudinho.

Behind the Scenes of a Self-Evolving AI: The Architecture of Tian AI

Meet Tian AI: Your Completely Offline AI Assistant for Android

UK to develop AI hardware plan

Copilot Cowork | The Control Plane for Long-Running AI Work | A Rahsi Framework™

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer