Risk-Controllable Multi-View Diffusion for Driving Scenario Generation

arXiv cs.CV / 3/13/2026

📰 NewsModels & Research

共有:

Key Points

RiskMV-DPO is a general pipeline enabling physically-informed, risk-controllable generation of multi-view driving scenarios by conditioning diffusion-based video synthesis on target risk levels and grounded risk modeling.
The approach adds a geometry-appearance alignment module and a region-aware direct preference optimization (RA-DPO) with motion-aware masking to ensure spatial-temporal coherence and focus learning on dynamic regions.
On the nuScenes dataset, RiskMV-DPO generates diverse long-tail scenarios while achieving state-of-the-art visual quality, increasing 3D detection mAP from 18.17 to 30.50 and reducing FID to 15.70.
This work shifts world models from passive environment prediction to proactive, risk-controllable synthesis, offering a scalable toolchain for safety-oriented embodied intelligence development.

Abstract

Generating safety-critical driving scenarios is crucial for evaluating and improving autonomous driving systems, but long-tail risky situations are rarely observed in real-world data and difficult to specify through manual scenario design. Existing generative approaches typically treat risk as an after-the-fact label and struggle to maintain geometric consistency in multi-view driving scenes. We present RiskMV-DPO, a general and systematic pipeline for physically-informed, risk-controllable multi-view scenario generation. By integrating target risk levels with physically-grounded risk modeling, we autonomously synthesize diverse and high-stakes dynamic trajectories that serve as explicit geometric anchors for a diffusion-based video generator. To ensure spatial-temporal coherence and geometric fidelity, we introduce a geometry-appearance alignment module and a region-aware direct preference optimization (RA-DPO) strategy with motion-aware masking to focus learning on localized dynamic regions.Experiments on the nuScenes dataset show that RiskMV-DPO can freely generate a wide spectrum of diverse long-tail scenarios while maintaining state-of-the-art visual quality, improving 3D detection mAP from 18.17 to 30.50 and reducing FID to 15.70. Our work shifts the role of world models from passive environment prediction to proactive, risk-controllable synthesis, providing a scalable toolchain for the safety-oriented development of embodied intelligence.

14 Best Self-Hosted Claude Alternatives for AI and Coding in 2026

Dev.to

[P] Finetuned small LMs to VLM adapters locally and wrote a short article about it

Reddit r/MachineLearning

Experiment: How far can a 28M model go in business email generation?

Reddit r/LocalLLaMA

Qwen 3.5 397b (180gb) scores 93% on MMLU

Reddit r/LocalLLaMA

Qwen 3.5 27B - quantize KV cache or not?

Reddit r/LocalLLaMA

Risk-Controllable Multi-View Diffusion for Driving Scenario Generation

Key Points

Abstract

Related Articles

14 Best Self-Hosted Claude Alternatives for AI and Coding in 2026

[P] Finetuned small LMs to VLM adapters locally and wrote a short article about it

Experiment: How far can a 28M model go in business email generation?

Qwen 3.5 397b (180gb) scores 93% on MMLU

Qwen 3.5 27B - quantize KV cache or not?

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer