On-Policy Distillation of Language Models for Autonomous Vehicle Motion Planning

arXiv cs.RO / 4/10/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper addresses how to deploy LLM-based autonomous vehicle motion planners on resource-constrained onboard systems by distilling knowledge from a large teacher model to a smaller student model.
It builds on GPT-Driver, treating driving scene understanding as language prompting and waypoint trajectory generation using chain-of-thought reasoning.
Two student training approaches are compared: on-policy generalized knowledge distillation (GKD) using dense token-level feedback from the teacher on the student’s own outputs, and a dense-feedback reinforcement learning (RL) baseline using teacher log-probabilities as per-token reward signals.
Experiments on the nuScenes benchmark show that on-policy GKD significantly outperforms the RL baseline and achieves performance close to teacher-level while using a model that is about 5× smaller.
The authors conclude that on-policy distillation is a principled and effective method for making LLM-based planners practical for real autonomous driving deployments.

Abstract

Large language models (LLMs) have recently demonstrated strong potential for autonomous vehicle motion planning by reformulating trajectory prediction as a language generation problem. However, deploying capable LLMs in resource-constrained onboard systems remains a fundamental challenge. In this paper, we study how to effectively transfer motion planning knowledge from a large teacher LLM to a smaller, more deployable student model. We build on the GPT-Driver framework, which represents driving scenes as language prompts and generates waypoint trajectories with chain-of-thought reasoning, and investigate two student training paradigms: (i) on-policy generalized knowledge distillation (GKD), which trains the student on its own self-generated outputs using dense token-level feedback from the teacher, and (ii) a dense-feedback reinforcement learning (RL) baseline that uses the teacher's log-probabilities as per-token reward signals in a policy gradient framework. Experiments on the nuScenes benchmark show that GKD substantially outperforms the RL baseline and closely approaches teacher-level performance despite a 5

\times

reduction in model size. These results highlight the practical value of on-policy distillation as a principled and effective approach to deploying LLM-based planners in autonomous driving systems.

CIA is trusting AI to help analyze intel from human spies

Reddit r/artificial

LLM API Pricing in 2026: I Put Every Major Model in One Table

Dev.to

i generated AI video on a GTX 1660. here's what it actually takes.

Dev.to

Meta-Optimized Continual Adaptation for planetary geology survey missions for extreme data sparsity scenarios

Dev.to

How To Optimize Enterprise AI Energy Consumption

Dev.to

On-Policy Distillation of Language Models for Autonomous Vehicle Motion Planning

Key Points

Abstract

Related Articles

CIA is trusting AI to help analyze intel from human spies

LLM API Pricing in 2026: I Put Every Major Model in One Table

i generated AI video on a GTX 1660. here's what it actually takes.

Meta-Optimized Continual Adaptation for planetary geology survey missions for extreme data sparsity scenarios

How To Optimize Enterprise AI Energy Consumption

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer