Large Language Model based Interactive Decision-Making for Autonomous Driving

arXiv cs.RO / 4/28/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes a Large Language Model (LLM)-based interactive decision-making framework for autonomous driving in high-conflict mixed-traffic scenarios with human-driven and autonomous vehicles.
  • It uses Object-Process Methodology to semantically model multi-vehicle scenes, converting low-level perception into objects, processes, and relations to reason over latent causal structure more effectively.
  • The LLM extracts explicit and implicit intents from surrounding agents and selects candidate maneuvers under jointly enforced safety and efficiency constraints.
  • The system generates and evaluates perturbed trajectory candidates using Monte Carlo sampling to produce an optimized executable trajectory.
  • For transparency and coordination, the final decision is converted into concise natural-language messages via the LLM and broadcast through an external human-machine interface, with simulator experiments showing improved safety, comfort, and efficiency and human-likeness in evaluations.

Abstract

In high-conflict mixed-traffic scenarios involving human-driven and autonomous vehicles, most existing autonomous driving systems default to overly conservative behaviors, lack proactive interaction, and consequently suffer from limited public acceptance. To mitigate intent misunderstandings and decision failures, we present a Large Language Model based interactive decision-making framework that augments scene understanding and intent-aware interaction to jointly improve safety and efficiency. The approach uses Object-Process Methodology to semantically model complex multi-vehicle scenes, abstracting low-level perceptual data into objects, processes, and relations, thereby streamlining reasoning over latent causal structure. Building on this representation, the Large Language Model parses both explicit and implicit intents of surrounding agents and, under jointly enforced safety and efficiency constraints, selects candidate maneuvers. We further generate perturbed trajectory candidates via Monte Carlo sampling and evaluate them to obtain an optimized executable trajectory. To foster transparency and coordination with nearby road users, the final decision is translated by the Large Language Model into concise natural-language messages and broadcast through an external Human-Machine Interface, completing a closed loop from scene understanding to action to language. Experiments in a cluster driving simulator demonstrate that the proposed method outperforms traditional baselines across safety, comfort, and efficiency metrics, while a Turing-test-style evaluation indicates a high degree of human-likeness in decision making. Besides, these results suggest that coupling semantic scene abstraction with Large Language Model mediated intent reasoning and language-based eHMI communication offers a practical pathway toward interactive, trustworthy autonomous driving in dense mixed traffic.