MAEPose: Self-Supervised Spatiotemporal Learning for Human Pose Estimation on mmWave Video
arXiv cs.AI / 5/4/2026
💬 OpinionModels & Research
Key Points
- The paper introduces MAEPose, a masked autoencoding method that performs human pose estimation directly on mmWave spectrogram video, aiming to preserve radar spatiotemporal information rather than relying on pre-extracted representations.
- MAEPose is trained using unlabelled radar video to learn motion-aware generalized representations, and then uses a heatmap decoder to produce multi-frame pose predictions.
- Experiments on three datasets using leave-one-person-out cross-validation show MAEPose outperforms prior baselines by up to 22.1% in MPJPE with statistical significance (p<0.05).
- The model remains relatively robust in zero-shot scenarios with bystander interference, showing only a 6.5% error increase, and ablation studies highlight the importance of both pre-training and the heatmap decoder.
- Modality analysis finds that using Range-Doppler video as input yields better pose performance than Range-Azimuth (or their fusion) while also reducing computational cost.
Related Articles
AnnouncementsBuilding a new enterprise AI services company with Blackstone, Hellman & Friedman, and Goldman Sachs
Anthropic News

Dara Khosrowshahi on replacing Uber drivers — and himself — with AI
The Verge

CLMA Frame Test
Dev.to

Governance and Liability in AI Agents: What I Built Trying to Answer Those Questions
Dev.to

Roundtable chat with Talkie-1930 and Gemma 4 31B
Reddit r/LocalLLaMA