Emotion-Conditioned Short-Horizon Human Pose Forecasting with a Lightweight Predictive World Model
arXiv cs.CV / 4/28/2026
📰 NewsModels & Research
Key Points
- The paper studies whether facial-expression-derived emotion embeddings can improve short-horizon human pose prediction, especially for emotion-driven motion dynamics that geometric cues alone may miss.
- It proposes a lightweight autoregressive “predictive world model” that performs 15-step rolling pose forecasts by combining pose keypoints and emotion embeddings via a learnable gating mechanism and using a two-layer LSTM recurrent model.
- Experiments on two small pose–emotion video datasets (controlled and natural, with larger facial expression changes) find that naive multimodal fusion does not reliably boost accuracy, while normalized gating fusion significantly improves performance for emotion-driven sequences.
- Counterfactual perturbation tests show the predicted pose trajectory changes measurably when multimodal inputs are altered, indicating the emotion embeddings serve as meaningful auxiliary conditional signals rather than redundant information.
Related Articles

Behind the Scenes of a Self-Evolving AI: The Architecture of Tian AI
Dev.to
Abliterlitics: Benchmarks and Tensor Comparison for Heretic, Abliterlix, Huiui, HauhauCS for GLM 4.7 Flash
Reddit r/LocalLLaMA

Record $1.1B Seed Funding for Reinforcement Learning Startup
AI Business

The One Substrate Failure Behind Every AI System in 2026
Reddit r/artificial

Into the Omniverse: Manufacturing’s Simulation-First Era Has Arrived
Nvidia AI Blog