AG-EgoPose: Leveraging Action-Guided Motion and Kinematic Joint Encoding for Egocentric 3D Pose Estimation

Key Points

The spatial stream produces 2D joint heatmaps and joint-specific spatial feature tokens using a weight-sharing ResNet-18 encoder-decoder, while the temporal stream leverages ResNet-50 features processed through an action-recognition backbone to capture motion dynamics.

Abstract

Egocentric 3D human pose estimation remains challenging due to severe perspective distortion, limited body visibility, and complex camera motion inherent in first-person viewpoints. Existing methods typically rely on single-frame analysis or limited temporal fusion, which fails to effectively leverage the rich motion context available in egocentric videos. We introduce AG-EgoPose, a novel dual-stream framework that integrates short- and long-range motion context with fine-grained spatial cues for robust pose estimation from fisheye camera input. Our framework features two parallel streams: A spatial stream uses a weight-sharing ResNet-18 encoder-decoder to generate 2D joint heatmaps and corresponding joint-specific spatial feature tokens. Simultaneously, a temporal stream uses a ResNet-50 backbone to extract visual features, which are then processed by an action recognition backbone to capture the motion dynamics. These complementary representations are fused and refined in a transformer decoder with learnable joint tokens, which allows for the joint-level integration of spatial and temporal evidence while maintaining anatomical constraints. Experiments on real-world datasets demonstrate that AG-EgoPose achieves state-of-the-art performance in both quantitative and qualitative metrics. Code is available at: https://github.com/Mushfiq5647/AG-EgoPose.

AG-EgoPose: Leveraging Action-Guided Motion and Kinematic Joint Encoding for Egocentric 3D Pose Estimation

Key Points

Abstract

Related Articles

The Redline Economy

$500 GPU outperforms Claude Sonnet on coding benchmarks

The LiteLLM Supply Chain Attack: A Wake-Up Call for AI Infrastructure

Kandou AI bags $225M Series A, Kobalt is sold for €1.3B, and meet Europe's Microsoft alternative

New Unsloth Studio Release!

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Related Articles

$500 GPU outperforms Claude Sonnet on coding benchmarks
Dev.to

The LiteLLM Supply Chain Attack: A Wake-Up Call for AI Infrastructure
Dev.to

Kandou AI bags $225M Series A, Kobalt is sold for €1.3B, and meet Europe's Microsoft alternative
Tech.eu

New Unsloth Studio Release!
Reddit r/LocalLLaMA