A Dataset and Evaluation for Complex 4D Markerless Human Motion Capture

arXiv cs.CV / 4/15/2026

📰 NewsSignals & Early TrendsModels & Research

共有:

Key Points

The paper introduces a new dataset and evaluation benchmark for complex 4D markerless human motion capture, designed to better reflect real-world challenges like multi-person interactions and heavy occlusions.
The dataset includes synchronized multi-view RGB and depth sequences with accurate camera calibration, ground-truth 3D motion from a Vicon system, and corresponding SMPL/SMPL-X parameters for tightly aligned supervision.
It covers both single- and multi-person scenarios featuring intricate motions, rapid position exchanges between similarly dressed subjects, varying subject distances, and frequent inter-person occlusions.
Benchmark results show that current state-of-the-art markerless 4D MoCap models experience substantial performance degradation when tested under these realistic conditions, revealing a persistent domain gap.
The authors report that targeted fine-tuning can improve generalization, suggesting the dataset is effective for driving more robust model development.

Abstract

Marker-based motion capture (MoCap) systems have long been the gold standard for accurate 4D human modeling, yet their reliance on specialized hardware and markers limits scalability and real-world deployment. Advancing reliable markerless 4D human motion capture requires datasets that reflect the complexity of real-world human interactions. Yet, existing benchmarks often lack realistic multi-person dynamics, severe occlusions, and challenging interaction patterns, leading to a persistent domain gap. In this work, we present a new dataset and evaluation for complex 4D markerless human motion capture. Our proposed MoCap dataset captures both single and multi-person scenarios with intricate motions, frequent inter-person occlusions, rapid position exchanges between similarly dressed subjects, and varying subject distances. It includes synchronized multi-view RGB and depth sequences, accurate camera calibration, ground-truth 3D motion capture from a Vicon system, and corresponding SMPL/SMPL-X parameters. This setup ensures precise alignment between visual observations and motion ground truth. Benchmarking state-of-the-art markerless MoCap models reveals substantial performance degradation under these realistic conditions, highlighting limitations of current approaches. We further demonstrate that targeted fine-tuning improves generalization, validating the dataset's realism and value for model development. Our evaluation exposes critical gaps in existing models and provides a rigorous foundation for advancing robust markerless 4D human motion capture.

Black Hat Asia

AI Business

Are gamers being used as free labeling labor? The rise of "Simulators" that look like AI training grounds [D]

Reddit r/MachineLearning

I built a trading intelligence MCP server in 2 days — here's how

Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Qwen3.5-35B running well on RTX4060 Ti 16GB at 60 tok/s

Reddit r/LocalLLaMA

A Dataset and Evaluation for Complex 4D Markerless Human Motion Capture

Key Points

Abstract

Related Articles

Black Hat Asia

Are gamers being used as free labeling labor? The rise of "Simulators" that look like AI training grounds [D]

I built a trading intelligence MCP server in 2 days — here's how

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Qwen3.5-35B running well on RTX4060 Ti 16GB at 60 tok/s

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer