Fail2Drive: Benchmarking Closed-Loop Driving Generalization

arXiv cs.RO / 4/10/2026

📰 NewsTools & Practical UsageModels & Research

共有:

Key Points

Fail2Driveは、CARLA上で「学習時と異なる条件（分布シフト）」に対するクローズドループ自動運転の一般化を、訓練シナリオの再利用ではない形で定量評価するためのベンチマークである。
200ルートと17の新規シナリオクラス（見た目・レイアウト・挙動・ロバスト性シフト）を備え、シフト後のルートを同一対応するインディストリビューション側と対にして差分影響を切り分ける。
複数の最先端モデルを評価した結果、成功率が平均22.8%低下するなど、一般化性能の一貫した劣化が観測された。
分析では、LiDARで明確に見えている物体を無視する、自由/占有空間といった基本概念を学べないなど、想定外の故障モードが明らかになった。
新規シナリオ生成や「専門家ポリシーによる解けるか（solvability）」の検証を支援するオープンソース・ツールボックスを提供し、コード/データ/ツールを公開して再現可能な研究基盤を目指している。

Abstract

Generalization under distribution shift remains a central bottleneck for closed-loop autonomous driving. Although simulators like CARLA enable safe and scalable testing, existing benchmarks rarely measure true generalization: they typically reuse training scenarios at test time. Success can therefore reflect memorization rather than robust driving behavior. We introduce Fail2Drive, the first paired-route benchmark for closed-loop generalization in CARLA, with 200 routes and 17 new scenario classes spanning appearance, layout, behavioral, and robustness shifts. Each shifted route is matched with an in-distribution counterpart, isolating the effect of the shift and turning qualitative failures into quantitative diagnostics. Evaluating multiple state-of-the-art models reveals consistent degradation, with an average success-rate drop of 22.8\%. Our analysis uncovers unexpected failure modes, such as ignoring objects clearly visible in the LiDAR and failing to learn the fundamental concepts of free and occupied space. To accelerate follow-up work, Fail2Drive includes an open-source toolbox for creating new scenarios and validating solvability via a privileged expert policy. Together, these components establish a reproducible foundation for benchmarking and improving closed-loop driving generalization. We open-source all code, data, and tools at https://github.com/autonomousvision/fail2drive .

Black Hat USA

AI Business

Black Hat Asia

AI Business

v0.20.5

Ollama Releases

Inside Anthropic's Project Glasswing: The AI Model That Found Zero-Days in Every Major OS

Dev.to

Gemma 4 26B fabricated an entire code audit. I have the forensic evidence from the database.

Reddit r/LocalLLaMA

Fail2Drive: Benchmarking Closed-Loop Driving Generalization

Key Points

Abstract

Related Articles

Black Hat USA

Black Hat Asia

v0.20.5

Inside Anthropic's Project Glasswing: The AI Model That Found Zero-Days in Every Major OS

Gemma 4 26B fabricated an entire code audit. I have the forensic evidence from the database.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer