Automating Crash Diagram Generation Using Vision-Language Models: A Case Study on Multi-Lane Roundabouts
arXiv cs.AI / 4/20/2026
💬 OpinionIdeas & Deep AnalysisTools & Practical UsageModels & Research
Key Points
- The study explores using Vision-Language Models (VLMs) to automatically generate crash diagrams from police crash reports, targeting the difficult scenario of multi-lane roundabouts.
- It proposes a three-stage structured prompting approach (interpretation, extraction, and visual synthesis) and introduces a 10-metric evaluation rubric covering semantic accuracy, spatial fidelity, and visual clarity.
- Testing 79 crash reports with GPT-4o, Gemini-1.5-Flash, and Janus-4o showed GPT-4o performed best on average (6.29/10), outperforming the others.
- The results indicate that stronger spatial reasoning improves alignment between extracted crash details and their rendered visualizations, while also revealing current limitations for engineering visualization tasks.
- The authors argue the work can support integrating generative AI into transportation safety analysis workflows to increase efficiency, consistency, and interpretability.



