Accelerating trajectory optimization with Sobolev-trained diffusion policies

arXiv cs.LG / 4/22/2026

📰 NewsModels & Research

共有:

Key Points

The paper proposes improving trajectory optimization (TO) by warm-starting gradient-based TO solvers with initial guesses generated by diffusion-based learned policies rather than solving each instance from scratch.
A key difficulty addressed is that TO-generated demonstrations are only locally optimal, so small deviations during policy rollout can move the system into off-distribution states and cause compounding errors over long horizons.
The authors introduce a Sobolev-learning approach for diffusion policies that uses not only trajectories but also feedback gains, deriving a first-order loss tailored to this setting.
Experiments show the resulting policy can avoid compounding errors, learn from very few trajectories, and reduce TO solve time by 2× to 20×.
By incorporating first-order information, the method requires fewer diffusion steps for accurate predictions, lowering inference latency.

Abstract

Trajectory Optimization (TO) solvers exploit known system dynamics to compute locally optimal trajectories through iterative improvements. A downside is that each new problem instance is solved independently; therefore, convergence speed and quality of the solution found depend on the initial trajectory proposed. To improve efficiency, a natural approach is to warm-start TO with initial guesses produced by a learned policy trained on trajectories previously generated by the solver. Diffusion-based policies have recently emerged as expressive imitation learning models, making them promising candidates for this role. Yet, a counterintuitive challenge comes from the local optimality of TO demonstrations: when a policy is rolled out, small non-optimal deviations may push it into situations not represented in the training data, triggering compounding errors over long horizons. In this work, we focus on learning-based warm-starting for gradient-based TO solvers that also provide feedback gains. Exploiting this specificity, we derive a first-order loss for Sobolev learning of diffusion-based policies using both trajectories and feedback gains. Through comprehensive experiments, we demonstrate that the resulting policy avoids compounding errors, and so can learn from very few trajectories to provide initial guesses reducing solving time by

2\times

20 \times

. Incorporating first-order information enables predictions with fewer diffusion steps, reducing inference latency.

Rethinking CNN Models for Audio Classification

Dev.to

v0.20.0rc1

vLLM Releases

I built my own event bus for a sustainability app — here's what I learned about agent automation using OpenClaw

Dev.to

HNHN: Hypergraph Networks with Hyperedge Neurons

Dev.to

Anthropic’s Mythos is stoking cybersecurity fears. What does it mean for China?

SCMP Tech

Accelerating trajectory optimization with Sobolev-trained diffusion policies

Key Points

Abstract

Related Articles

Rethinking CNN Models for Audio Classification

v0.20.0rc1

I built my own event bus for a sustainability app — here's what I learned about agent automation using OpenClaw

HNHN: Hypergraph Networks with Hyperedge Neurons

Anthropic’s Mythos is stoking cybersecurity fears. What does it mean for China?

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer