BridgeACT: Bridging Human Demonstrations to Robot Actions via Unified Tool-Target Affordances

arXiv cs.RO / 4/28/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

BridgeACT is a new affordance-driven framework for learning robot manipulation directly from human videos, avoiding the need for any robot demonstration data.
The approach uses embodiment-agnostic intermediate affordance representations to bridge human demonstrations and executable robot actions.
It decomposes manipulation into two parts—identifying where to grasp and predicting how to move—by grounding task-relevant affordance regions and then predicting task-conditioned 3D motion affordances.
BridgeACT maps learned affordances to real robot behaviors via a grasping module and a lightweight closed-loop motion controller, supporting direct real-robot deployment.
Experiments on real-world tasks indicate improved performance over prior baselines and strong generalization to unseen objects, scenes, and viewpoints.

Abstract

Learning robot manipulation from human videos is appealing due to the scale and diversity of human demonstrations, but transferring such demonstrations to executable robot behavior remains challenging. Prior work either relies on robot data for downstream adaptation or learns affordance representations that remain at the perception level and do not directly support real-world execution. We present BridgeACT, an affordance-driven framework that learns robotic manipulation directly from human videos without requiring any robot demonstration data. Our key idea is to model affordance as an embodiment-agnostic intermediate representation that bridges human demonstrations and robot actions. BridgeACT decomposes manipulation into two complementary problems: where to grasp and how to move. To this end, BridgeACT first grounds task-relevant affordance regions in the current scene, and then predicts task-conditioned 3D motion affordances from human demonstrations. The resulting affordances are mapped to robot actions through a grasping module and a lightweight closed-loop motion controller, enabling direct deployment on real robots. In addition, we represent complex manipulation tasks as compositions of affordance operations, which allows a unified treatment of diverse tasks and object-to-object interactions. Experiments on real-world manipulation tasks show that BridgeACT outperforms prior baselines and generalizes to unseen objects, scenes, and viewpoints.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 4/28DailyView insight →

How I Automate My Dev Workflow with Claude Code Hooks

Dev.to

Claude Haiku for Low-Cost AI Inference: Patterns from a Horse Racing Prediction System

Dev.to

How We Built an Ambient AI Clinical Documentation Pipeline (and Saved Doctors 8+ Hours a Week)

Dev.to

🦀 PicoClaw Deep Dive — A Field Guide to Building an Ultra-Light AI Agent in Go 🐹

Dev.to

Real-Time Monitoring for AI Agents: Beyond Log Streaming

Dev.to

BridgeACT: Bridging Human Demonstrations to Robot Actions via Unified Tool-Target Affordances

Key Points

Abstract

💡 Insights using this article

Related Articles

How I Automate My Dev Workflow with Claude Code Hooks

Claude Haiku for Low-Cost AI Inference: Patterns from a Horse Racing Prediction System

How We Built an Ambient AI Clinical Documentation Pipeline (and Saved Doctors 8+ Hours a Week)

🦀 PicoClaw Deep Dive — A Field Guide to Building an Ultra-Light AI Agent in Go 🐹

Real-Time Monitoring for AI Agents: Beyond Log Streaming

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer