General-purpose LLMs as Models of Human Driver Behavior: The Case of Simplified Merging
arXiv cs.AI / 4/14/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper evaluates whether general-purpose LLMs can act as standalone models of human driver behavior in a simplified 1D merging scenario used for virtual AV safety assessment.
- It embeds two closed-loop LLM driver agents—OpenAI o3 and Google Gemini 2.5 Pro—and compares their quantitative/qualitative behavior against human driving data.
- The LLMs reproduce some human-like traits, including intermittent control and tactical dependencies on spatial cues.
- However, both models fail to consistently capture human responses to dynamic velocity cues, leading to sharp divergences in safety performance versus human data.
- A prompt ablation study shows that prompt components function as model-specific inductive biases that do not transfer across LLMs, implying limited portability and highlighting important validity/failure-mode concerns.
Related Articles

Black Hat Asia
AI Business
Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to
Don't forget, there is more than forgetting: new metrics for Continual Learning
Dev.to
Microsoft MAI-Image-2-Efficient Review 2026: The AI Image Model Built for Production Scale
Dev.to
Bit of a strange question?
Reddit r/artificial