Tempered Sequential Monte Carlo for Trajectory and Policy Optimization with Differentiable Dynamics

arXiv cs.LG / 4/24/2026

📰 NewsModels & Research

共有:

Key Points

The paper frames finite-horizon trajectory and policy optimization under differentiable dynamics as an inference problem by minimizing a KL-regularized expected trajectory cost, resulting in a “Boltzmann-tilted” controller-parameter distribution.
It introduces tempered sequential Monte Carlo (TSMC), which anneals from a prior to the target distribution while adaptively reweighting and resampling particles to handle sharp, potentially multimodal targets efficiently.
To preserve particle diversity and leverage gradient information, TSMC uses Hamiltonian Monte Carlo rejuvenation and differentiates through trajectory rollouts to obtain exact gradients.
For policy optimization, the method is extended with a deterministic empirical approximation of the initial-state distribution and an extended-space formulation that treats rollout randomness as auxiliary variables.
Experiments on trajectory and policy optimization benchmarks indicate TSMC is broadly applicable and performs favorably against state-of-the-art baselines.

Abstract

We propose a sampling-based framework for finite-horizon trajectory and policy optimization under differentiable dynamics by casting controller design as inference. Specifically, we minimize a KL-regularized expected trajectory cost, which yields an optimal "Boltzmann-tilted" distribution over controller parameters that concentrates on low-cost solutions as temperature decreases. To sample efficiently from this sharp, potentially multimodal target, we introduce tempered sequential Monte Carlo (TSMC): an annealing scheme that adaptively reweights and resamples particles along a tempering path from a prior to the target distribution, while using Hamiltonian Monte Carlo rejuvenation to maintain diversity and exploit exact gradients obtained by differentiating through trajectory rollouts. For policy optimization, we extend TSMC via (i) a deterministic empirical approximation of the initial-state distribution and (ii) an extended-space construction that treats rollout randomness as auxiliary variables. Experiments across trajectory- and policy-optimization benchmarks show that TSMC is broadly applicable and compares favorably to state-of-the-art baselines.

GPT-5.5 is here. So is DeepSeek V4. And honestly, I am tired of version numbers.

Dev.to

I Built an AI Image Workflow with GPT Image 2.0 (+ Fixing Its Biggest Flaw)

Dev.to

Max-and-Omnis/Nemotron-3-Super-64B-A12B-Math-REAP-GGUF

Reddit r/LocalLLaMA

Building a Visual Infrastructure Layer: How We’re Solving the "Visual Trust Gap" for E-com

Dev.to

DeepSeek-V4 Runs on Huawei Ascend Chips at 85% Utilization — Here's What That Means for AI Infrastructure and Pricing

Dev.to

Tempered Sequential Monte Carlo for Trajectory and Policy Optimization with Differentiable Dynamics

Key Points

Abstract

Related Articles

GPT-5.5 is here. So is DeepSeek V4. And honestly, I am tired of version numbers.

I Built an AI Image Workflow with GPT Image 2.0 (+ Fixing Its Biggest Flaw)

Max-and-Omnis/Nemotron-3-Super-64B-A12B-Math-REAP-GGUF

Building a Visual Infrastructure Layer: How We’re Solving the "Visual Trust Gap" for E-com

DeepSeek-V4 Runs on Huawei Ascend Chips at 85% Utilization — Here's What That Means for AI Infrastructure and Pricing

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer