MuPPet: Multi-person 2D-to-3D Pose Lifting

arXiv cs.CV / 4/14/2026

📰 NewsSignals & Early TrendsModels & Research

Key Points

  • MuPPetは、2Dのヒト姿勢推定から3D姿勢を復元する多人数向け手法で、個人間の相関(インタラクション)を明示的にモデリングする点が特徴です。
  • Person Encoding、Permutation Augmentation、Dynamic Multi-Person Attentionにより、表現の構造化・学習データの多様化・人数や関係性に応じた動的な注意機構を実現します。
  • 研究ではグループ相互作用データセットで、従来の単人数・既存の多人数2D-to-3D法に対して精度を大きく改善したと報告しています。
  • 閉塞(occlusion)状況に対する頑健性も向上し、多人数で社会的に整合する3D姿勢推定の重要性を示す内容です。
  • 実装コードはGitHubで公開されています。

Abstract

Multi-person social interactions are inherently built on coherence and relationships among all individuals within the group, making multi-person localization and body pose estimation essential to understanding these social dynamics. One promising approach is 2D-to-3D pose lifting which provides a 3D human pose consisting of rich spatial details by building on the significant advances in 2D pose estimation. However, the existing 2D-to-3D pose lifting methods often neglect inter-person relationships or cannot handle varying group sizes, limiting their effectiveness in multi-person settings. We propose MuPPet, a novel multi-person 2D-to-3D pose lifting framework that explicitly models inter-person correlations. To leverage these inter-person dependencies, our approach introduces Person Encoding to structure individual representations, Permutation Augmentation to enhance training diversity, and Dynamic Multi-Person Attention to adaptively model correlations between individuals. Extensive experiments on group interaction datasets demonstrate MuPPet significantly outperforms state-of-the-art single- and multi-person 2D-to-3D pose lifting methods, and improves robustness in occlusion scenarios. Our findings highlight the importance of modeling inter-person correlations, paving the way for accurate and socially-aware 3D pose estimation. Our code is available at: https://github.com/Thomas-Markhorst/MuPPet