Orion-Lite: Distilling LLM Reasoning into Efficient Vision-Only Driving Models

arXiv cs.CV / 4/10/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes Orion-Lite, a compact vision-only driving model that distills reasoning knowledge from large vision-language-action (VLA) systems to reduce latency and energy costs for deployment.
  • It advances prior distillation work by targeting more complex, interactive scenarios and evaluating under closed-loop driving conditions rather than only simple or open-loop tests.
  • The method combines latent feature distillation with ground-truth trajectory supervision to preserve effective planning and control behaviors in the smaller student model.
  • Orion-Lite is reported to outperform its larger VLA teacher (ORION) and achieve a new state-of-the-art on the Bench2Drive benchmark, with a Driving Score of 80.6.
  • The authors conclude that vision-only architectures can deliver strong reactive planning performance and may represent an untapped path for high-efficiency autonomous driving.

Abstract

Leveraging the general world knowledge of Large Language Models (LLMs) holds significant promise for improving the ability of autonomous driving systems to handle rare and complex scenarios. While integrating LLMs into Vision-Language-Action (VLA) models has yielded state-of-the-art performance, their massive parameter counts pose severe challenges for latency-sensitive and energy-efficient deployment. Distilling LLM knowledge into a compact driving model offers a compelling solution to retain these reasoning capabilities while maintaining a manageable computational footprint. Although previous works have demonstrated the efficacy of distillation, these efforts have primarily focused on relatively simple scenarios and open-loop evaluations. Therefore, in this work, we investigate LLM distillation in more complex, interactive scenarios under closed-loop evaluation. We demonstrate that through a combination of latent feature distillation and ground-truth trajectory supervision, an efficient vision-only student model \textbf{Orion-Lite} can even surpass the performance of its massive VLA teacher, ORION. Setting a new state-of-the-art on the rigorous Bench2Drive benchmark, with a Driving Score of 80.6. Ultimately, this reveals that vision-only architectures still possess significant, untapped potential for high-performance reactive planning.