Automating Crash Diagram Generation Using Vision-Language Models: A Case Study on Multi-Lane Roundabouts

arXiv cs.AI / 4/20/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical UsageModels & Research

共有:

Key Points

The study explores using Vision-Language Models (VLMs) to automatically generate crash diagrams from police crash reports, targeting the difficult scenario of multi-lane roundabouts.
It proposes a three-stage structured prompting approach (interpretation, extraction, and visual synthesis) and introduces a 10-metric evaluation rubric covering semantic accuracy, spatial fidelity, and visual clarity.
Testing 79 crash reports with GPT-4o, Gemini-1.5-Flash, and Janus-4o showed GPT-4o performed best on average (6.29/10), outperforming the others.
The results indicate that stronger spatial reasoning improves alignment between extracted crash details and their rendered visualizations, while also revealing current limitations for engineering visualization tasks.
The authors argue the work can support integrating generative AI into transportation safety analysis workflows to increase efficiency, consistency, and interpretability.

Abstract

Crash diagrams are essential tools in transportation safety analysis, yet their manual preparation remains time-consuming and prone to human variability. This study investigates the use of Vision-Language Models (VLMs) to automate crash diagram generation from police crash reports, focusing on multilane roundabouts as a challenging test case. A three-part structured prompt framework was developed to guide model reasoning through interpretation, extraction, and visual synthesis, while a 10-metric evaluation system was designed to assess diagram quality in terms of semantic accuracy, spatial fidelity, and visual clarity. Three popular models, including GPT-4o, Gemini-1.5-Flash, and Janus-4o, were tested on 79 crash reports. GPT-4o achieved the highest average performance (6.29 out of 10), followed by Gemini-1.5-Flash (5.28) and Janus-4o (3.64). The analysis revealed GPT-4o's superior spatial reasoning and alignment between extracted and visualized crash data. These results highlight both the promise and current limitations of VLMs in engineering visualization tasks. The study lays the groundwork for integrating generative AI into crash analysis workflows to improve efficiency, consistency, and interpretability.

Black Hat USA

AI Business

Black Hat Asia

AI Business

Which Version of Qwen 3.6 for M5 Pro 24g

Reddit r/LocalLLaMA

From Theory to Reality: Why Most AI Agent Projects Fail (And How Mine Did Too)

Dev.to

GPT-5.4-Cyber: OpenAI's Game-Changer for AI Security and Defensive AI

Dev.to

Automating Crash Diagram Generation Using Vision-Language Models: A Case Study on Multi-Lane Roundabouts

Key Points

Abstract

Related Articles

Black Hat USA

Black Hat Asia

Which Version of Qwen 3.6 for M5 Pro 24g

From Theory to Reality: Why Most AI Agent Projects Fail (And How Mine Did Too)

GPT-5.4-Cyber: OpenAI's Game-Changer for AI Security and Defensive AI

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer