SMP: Reusable Score-Matching Motion Priors for Physics-Based Character Control

arXiv cs.RO / 4/28/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces Score-Matching Motion Priors (SMP), a method for learning reusable, task-agnostic motion reward priors for physics-based character control.
  • Unlike prior adversarial imitation learning approaches that typically require retraining per controller and retaining reference motion data, SMP can be pre-trained once on motion data and then reused without changing the model.
  • SMP leverages pre-trained motion diffusion models and score distillation sampling (SDS) to create reward functions that can remain frozen while training new control policies for downstream tasks.
  • Experiments on physically simulated humanoids show SMP can be adapted into style-specific priors from a general large-scale motion prior and can even compose multiple styles to generate new ones not present in the original dataset.
  • The authors report that the motions produced by SMP are competitive with state-of-the-art adversarial imitation learning methods across a broad set of control tasks.

Abstract

Data-driven motion priors that can guide agents toward producing naturalistic behaviors play a pivotal role in creating life-like virtual characters. Adversarial imitation learning has been a highly effective method for learning motion priors from reference motion data. However, adversarial priors, with few exceptions, need to be retrained for each new controller, thereby limiting their reusability and necessitating the retention of the reference motion data when applied to downstream tasks. In this work, we present Score-Matching Motion Priors (SMP), which leverages pre-trained motion diffusion models and score distillation sampling (SDS) to create reusable task-agnostic motion priors. SMPs can be pre-trained on a motion dataset, independent of any control policy or task. Once trained, SMPs can be kept frozen and reused as general-purpose reward functions to train new policies to produce naturalistic behaviors for downstream tasks. We show that a general motion prior trained on large-scale datasets can be repurposed into a variety of style-specific priors. Furthermore, SMP can compose different styles to synthesize new styles not present in the original dataset. Our method can create reusable and modular motion priors that produce high-quality motions comparable to state-of-the-art adversarial imitation learning methods. In our experiments, we demonstrate the effectiveness of SMP across a diverse suite of control tasks with physically simulated humanoid characters. Video available at https://youtu.be/jBA2tWk6vzU