2D Pre-Training for 3D Pose Estimation

arXiv cs.CV / 4/28/2026

📰 NewsModels & Research

共有:

Key Points

The paper proposes an expanded pre-training scheme for 3D Human Pose Estimation (HPE) that leverages a wider set of 2D and 3D datasets beyond limited benchmarks like Human3.6M.
It investigates how factors of 2D pre-training—such as model size—impact downstream 3D pose estimation performance and generalization across different datasets.
The results show that 2D pre-training consistently beats training on 3D data alone, with gains that are especially strong in computational efficiency.
The authors report achieving an MPJPE score below 64.5mm using MPII and Human3.6M, indicating improved accuracy under the proposed approach.
Overall, the study emphasizes that stronger 2D representation learning can improve 3D pose estimation while reducing training cost relative to 3D-only approaches.

Abstract

Pre-training is a general method that is used in a range of deep learning tasks. By first training a model on one task, and then further training on the downstream task used for final evaluation, the model is forced to learn a more general understanding of the input data. While pre-training has been applied to 3D Human Pose Estimation (HPE) previously, the scope of datasets used is typically very limited to some strong benchmarks, like Human3.6M. Therefore, in this project, we expand the scope of an existing 3D HPE scheme to be compatible with additional 2D and 3D HPE datasets, like Occlusion Person. We perform an extensive study on how aspects of 2D pre-training, such as model size, affect downstream performance, and to what extent pre-training can help the model generalize to different datasets. Experimental results show that 2D pre-training consistently outperforms training on 3D data alone, particularly in terms of computational efficiency. Finally, using MPII and Human3.6M, we are able to obtain an MPJPE score of under 64.5mm.