Accelerating trajectory optimization with Sobolev-trained diffusion policies

arXiv cs.LG / 4/22/2026

📰 NewsModels & Research

Key Points

  • The paper proposes improving trajectory optimization (TO) by warm-starting gradient-based TO solvers with initial guesses generated by diffusion-based learned policies rather than solving each instance from scratch.
  • A key difficulty addressed is that TO-generated demonstrations are only locally optimal, so small deviations during policy rollout can move the system into off-distribution states and cause compounding errors over long horizons.
  • The authors introduce a Sobolev-learning approach for diffusion policies that uses not only trajectories but also feedback gains, deriving a first-order loss tailored to this setting.
  • Experiments show the resulting policy can avoid compounding errors, learn from very few trajectories, and reduce TO solve time by 2× to 20×.
  • By incorporating first-order information, the method requires fewer diffusion steps for accurate predictions, lowering inference latency.

Abstract

Trajectory Optimization (TO) solvers exploit known system dynamics to compute locally optimal trajectories through iterative improvements. A downside is that each new problem instance is solved independently; therefore, convergence speed and quality of the solution found depend on the initial trajectory proposed. To improve efficiency, a natural approach is to warm-start TO with initial guesses produced by a learned policy trained on trajectories previously generated by the solver. Diffusion-based policies have recently emerged as expressive imitation learning models, making them promising candidates for this role. Yet, a counterintuitive challenge comes from the local optimality of TO demonstrations: when a policy is rolled out, small non-optimal deviations may push it into situations not represented in the training data, triggering compounding errors over long horizons. In this work, we focus on learning-based warm-starting for gradient-based TO solvers that also provide feedback gains. Exploiting this specificity, we derive a first-order loss for Sobolev learning of diffusion-based policies using both trajectories and feedback gains. Through comprehensive experiments, we demonstrate that the resulting policy avoids compounding errors, and so can learn from very few trajectories to provide initial guesses reducing solving time by 2\times to 20 \times. Incorporating first-order information enables predictions with fewer diffusion steps, reducing inference latency.