Latent Chain-of-Thought World Modeling for End-to-End Driving

arXiv cs.RO / 4/15/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces Latent Chain-of-Thought World Modeling for end-to-end autonomous driving, aiming to improve safety and performance in challenging scenarios.
Instead of text-based chain-of-thought, LCDrive represents reasoning in a latent language that interleaves action-proposal tokens (from the action vocabulary) with world-model tokens capturing likely future outcomes.
The model “cold starts” by supervising both action proposals and world-model tokens using ground-truth future rollouts, then further improves reasoning via closed-loop reinforcement learning.
On a large-scale end-to-end driving benchmark, LCDrive reports faster inference, higher trajectory quality, and stronger gains from interactive reinforcement learning than non-reasoning and text-reasoning baselines.

Abstract

Recent Vision-Language-Action (VLA) models for autonomous driving explore inference-time reasoning as a way to improve driving performance and safety in challenging scenarios. Most prior work uses natural language to express chain-of-thought (CoT) reasoning before producing driving actions. However, text may not be the most efficient representation for reasoning. In this work, we present Latent-CoT-Drive (LCDrive): a model that expresses CoT in a latent language that captures possible outcomes of the driving actions being considered. Our approach unifies CoT reasoning and decision making by representing both in an action-aligned latent space. Instead of natural language, the model reasons by interleaving (1) action-proposal tokens, which use the same vocabulary as the model's output actions; and (2) world model tokens, which are grounded in a learned latent world model and express future outcomes of these actions. We cold start latent CoT by supervising the model's action proposals and world model tokens based on ground-truth future rollouts of the scene. We then post-train with closed-loop reinforcement learning to strengthen reasoning capabilities. On a large-scale end-to-end driving benchmark, LCDrive achieves faster inference, better trajectory quality, and larger improvements from interactive reinforcement learning compared to both non-reasoning and text-reasoning baselines.

RAG in Practice — Part 4: Chunking, Retrieval, and the Decisions That Break RAG

Dev.to

Why dynamically routing multi-timescale advantages in PPO causes policy collapse (and a simple decoupled fix) [R]

Reddit r/MachineLearning

How AI Interview Assistants Are Changing Job Preparation in 2026

Dev.to

Consciousness in Artificial Intelligence: Insights from the Science ofConsciousness

Dev.to

NEW PROMPT INJECTION

Dev.to

Latent Chain-of-Thought World Modeling for End-to-End Driving

Key Points

Abstract

Related Articles

RAG in Practice — Part 4: Chunking, Retrieval, and the Decisions That Break RAG

Why dynamically routing multi-timescale advantages in PPO causes policy collapse (and a simple decoupled fix) [R]

How AI Interview Assistants Are Changing Job Preparation in 2026

Consciousness in Artificial Intelligence: Insights from the Science ofConsciousness

NEW PROMPT INJECTION

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer