Vision-Based Hand Shadowing for Robotic Manipulation via Inverse Kinematics

arXiv cs.AI / 3/13/2026

💬 OpinionTools & Practical UsageModels & Research

共有:

Key Points

The paper presents an offline hand-shadowing and retargeting pipeline that uses a single egocentric RGB-D camera on 3D-printed glasses to control a 6-DOF robot via inverse kinematics in PyBullet.
It detects 21 hand landmarks per hand with MediaPipe Hands, reconstructs 3D hand pose, transforms it into the robot frame, and solves a damped-least-squares IK problem to generate joint commands for the SO-ARM101.
A gripper controller maps thumb-index geometry to grasp aperture using a four-level fallback, with actions previewed in a physics simulation before replay on the physical robot through the LeRobot framework.
In evaluation, the structured pick-and-place benchmark achieves 90% success, while real-world unstructured environments with occlusion reduce success to 9.3%, illustrating both promise and current limitations of marker-free analytical retargeting.
The work highlights the potential of vision-based retargeting for teleoperation while underscoring challenges like occlusion and environment clutter in achieving robust performance.

Abstract

Teleoperation of low-cost robotic manipulators remains challenging due to the complexity of mapping human hand articulations to robot joint commands. We present an offline hand-shadowing and retargeting pipeline from a single egocentric RGB-D camera mounted on 3D-printed glasses. The pipeline detects 21 hand landmarks per hand using MediaPipe Hands, deprojects them into 3D via depth sensing, transforms them into the robot coordinate frame, and solves a damped-least-squares inverse kinematics problem in PyBullet to produce joint commands for the 6-DOF SO-ARM101 robot. A gripper controller maps thumb-index finger geometry to grasp aperture with a four-level fallback hierarchy. Actions are first previewed in a physics simulation before replay on the physical robot through the LeRobot framework. We evaluate the IK retargeting pipeline on a structured pick-and-place benchmark (5-tile grid, 10 grasps per tile) achieving a 90% success rate, and compare it against four vision-language-action policies (ACT, SmolVLA, pi0.5, GR00T N1.5) trained on leader-follower teleoperation data. We also test the IK pipeline in unstructured real-world environments (grocery store, pharmacy), where hand occlusion by surrounding objects reduces success to 9.3% (N=75), highlighting both the promise and current limitations of marker-free analytical retargeting.

Two bots, one confused server: what Nimbus revealed about AI agent identity

Dev.to

How to Create a Month of Content in One Day Using AI (Step-by-Step System)

Dev.to

OpenTelemetry just standardized LLM tracing. Here's what it actually looks like in code.

Dev.to

🌱 How AI is Transforming Planting — and Why It Matters

Dev.to

What is MCP?

Dev.to

Vision-Based Hand Shadowing for Robotic Manipulation via Inverse Kinematics

Key Points

Abstract

Related Articles

Two bots, one confused server: what Nimbus revealed about AI agent identity

How to Create a Month of Content in One Day Using AI (Step-by-Step System)

OpenTelemetry just standardized LLM tracing. Here's what it actually looks like in code.

🌱 How AI is Transforming Planting — and Why It Matters

What is MCP?

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer