DeepTest Tool Competition 2026: Benchmarking an LLM-Based Automotive Assistant
arXiv cs.AI / 4/15/2026
📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research
Key Points
- The paper reports results from the first Large Language Model (LLM) Testing competition at the DeepTest workshop during ICSE 2026.
- Four competing tools were benchmarked on an LLM-based automotive assistant tasked with retrieving car manual information and correctly mentioning relevant warnings.
- The competition focused on finding user inputs where the system fails to appropriately surface warnings, using metrics centered on failure-finding effectiveness and test diversity.
- The report details the experimental methodology, describes the participating competitor tools, and summarizes the comparative outcomes of their performance.




