Human Pose Estimation in Trampoline Gymnastics: Improving Performance Using a New Synthetic Dataset

arXiv cs.CV / 4/3/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The study addresses poor human pose estimation performance in trampoline gymnastics, where athletes exhibit extreme poses and unusual multi-view viewpoints.
Researchers introduce a new synthetic dataset, STP, generated from motion-capture trampoline routines by fitting noisy mocap to a parametric human model and rendering realistic multiview images.
A ViTPose model is fine-tuned on STP, and the improved 2D keypoint accuracy carries over to better 3D pose reconstruction via triangulation.
On challenging real multi-view trampoline images, the fine-tuned model achieves state-of-the-art 2D results and reduces 3D MPJPE by 12.5 mm (a 19.6% improvement over the pretrained ViTPose).
The work narrows the performance gap between “common” pose scenarios and highly atypical gymnastics poses, demonstrating the value of synthetic data for domain-specific perception.

Abstract

Trampoline gymnastics involves extreme human poses and uncommon viewpoints, on which state-of-the art pose estimation models tend to under-perform. We demonstrate that this problem can be addressed by fine-tuning a pose estimation model on a dataset of synthetic trampoline poses (STP). STP is generated from motion capture recordings of trampoline routines. We develop a pipeline to fit noisy motion capture data to a parametric human model, then generate multiview realistic images. We use this data to fine-tune a ViTPose model, and test it on real multi-view trampoline images. The resulting model exhibits accuracy improvements in 2D which translates to improved 3D triangulation. In 2D, we obtain state-of-the-art results on such challenging data, bridging the performance gap between common and extreme poses. In 3D, we reduce the MPJPE by 12.5 mm with our best model, which represents an improvement of 19.6% compared to the pretrained ViTPose model.