Gesture-Aware Pretraining and Token Fusion for 3D Hand Pose Estimation

arXiv cs.CV / 3/19/2026

📰 NewsModels & Research

共有:

Key Points

The paper proposes a gesture-aware pretraining framework for 3D hand pose estimation from monocular RGB images, leveraging gesture labels to provide a useful inductive bias.
It presents a two-stage pipeline consisting of gesture-aware pretraining to learn an informative embedding space from coarse and fine gesture labels, followed by a per-joint token Transformer that uses gesture embeddings to regress MANO hand parameters.
The training objective is layered, supervising parameters, joints, and structural constraints to guide learning.
Experiments on InterHand2.6M show that gesture-aware pretraining improves single-hand accuracy over the prior EANet baseline and generalizes across architectures without modification.

Abstract

Estimating 3D hand pose from monocular RGB images is fundamental for applications in AR/VR, human-computer interaction, and sign language understanding. In this work we focus on a scenario where a discrete set of gesture labels is available and show that gesture semantics can serve as a powerful inductive bias for 3D pose estimation. We present a two-stage framework: gesture-aware pretraining that learns an informative embedding space using coarse and fine gesture labels from InterHand2.6M, followed by a per-joint token Transformer guided by gesture embeddings as intermediate representations for final regression of MANO hand parameters. Training is driven by a layered objective over parameters, joints, and structural constraints. Experiments on InterHand2.6M demonstrate that gesture-aware pretraining consistently improves single-hand accuracy over the state-of-the-art EANet baseline, and that the benefit transfers across architectures without any modification.

【無料版】まじん式 v4

note

【無料版】まじん式 v4

note

🌱 Reiが「死後も進化し、将棋を指し、自分を書き換える」存在になった日——STEP187〜201、世界初D-FUMT NNUEと永続自律進化の完成

note

「因果推論を入れたら、最適な施策が逆転しました」—— 多様体上の政策ベクトル場を因果的に浄化する

Qiita

A Small Experiment: A Memory Management System for AI to Abstract from Experience (Part 1)

Dev.to

Gesture-Aware Pretraining and Token Fusion for 3D Hand Pose Estimation

Key Points

Abstract

Related Articles

【無料版】まじん式 v4

【無料版】まじん式 v4

🌱 Reiが「死後も進化し、将棋を指し、自分を書き換える」存在になった日——STEP187〜201、世界初D-FUMT NNUEと永続自律進化の完成

「因果推論を入れたら、最適な施策が逆転しました」—— 多様体上の政策ベクトル場を因果的に浄化する

A Small Experiment: A Memory Management System for AI to Abstract from Experience (Part 1)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer