SnapPose3D: Diffusion-Based Single-Frame 2D-to-3D Lifting of Human Poses

arXiv cs.CV / 4/30/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

SnapPose3D tackles 2D-to-3D human pose lifting challenges caused by depth ambiguity and joint uncertainty by producing multiple pose hypotheses rather than a single deterministic estimate.
The method uses diffusion-based generation to denoise 3D poses conditioned on visual context and 2D pose features, and then aggregates sampled hypotheses into a final pose.
Unlike many prior approaches that rely on temporal sequences to resolve ambiguity, SnapPose3D operates on single frames, avoiding tracking and reducing computational and data-collection complexity.
The framework is trained deterministically but performs probabilistic multi-hypothesis sampling during inference, yielding state-of-the-art performance on standard 3D human pose estimation benchmarks.
Overall, the paper demonstrates that diffusion models can effectively handle pose ambiguity in lifting tasks while maintaining practical efficiency for non-sequential inputs.

Abstract

Depth ambiguity and joint uncertainty are the two main obstacles in obtaining accurate human pose predictions by 2D-to-3D lifting methods proposed in the literature. In particular, these issues are caused by 2D joint locations that can be mapped to multiple 3D positions, inducing multiple possible final poses. Following these considerations, we propose leveraging diffusion-based models generation capability to predict multiple hypotheses and aggregate them in a final accurate pose. Therefore, we introduce SnapPose3D, a pose-lifting framework trained deterministically to denoise 3D poses conditioned on both visual context and 2D pose features. SnapPose3D adopts a probabilistic approach during inference, generating multiple hypotheses through random sampling from a unit Gaussian distribution. Unlike most previous methods that address pose ambiguity by processing temporal sequences, SnapPose3D uses single frames as input, avoiding tracking and limiting computational cost, data acquisition complexity, and the need for online, real-time applications. We extensively evaluate SnapPose3D on well-known benchmarks for the 3D human pose estimation task showing its ability to generate and aggregate accurate hypotheses that lead to state-of-the-art results.

Vector DB and ANN vs PHE conflict, is there a practical workaround? [D]

Reddit r/MachineLearning

Agent Amnesia and the Case of Henry Molaison

Dev.to

Azure Weekly: Microsoft and OpenAI Restructure Partnership as GPT-5.5 Lands in Foundry

Dev.to

Proven Patterns for OpenAI Codex in 2026: Prompts, Validation, and Gateway Governance

Dev.to

Vibe coding is a tool, not a shortcut. Most people are using it wrong.

Dev.to

SnapPose3D: Diffusion-Based Single-Frame 2D-to-3D Lifting of Human Poses

Key Points

Abstract

Related Articles

Vector DB and ANN vs PHE conflict, is there a practical workaround? [D]

Agent Amnesia and the Case of Henry Molaison

Azure Weekly: Microsoft and OpenAI Restructure Partnership as GPT-5.5 Lands in Foundry

Proven Patterns for OpenAI Codex in 2026: Prompts, Validation, and Gateway Governance

Vibe coding is a tool, not a shortcut. Most people are using it wrong.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer