Steerable Adversarial Scenario Generation through Test-Time Preference Alignment

arXiv cs.RO / 5/6/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical UsageModels & Research

共有:

Key Points

The paper reframes adversarial scenario generation for autonomous driving safety as a multi-objective preference alignment problem, addressing the limitation of existing methods that rely on a single fixed trade-off between adversariality and realism.
It introduces SAGE (Steerable Adversarial scenario GEnerator), which allows fine-grained control of the adversariality–realism balance at test time without any retraining.
SAGE uses hierarchical group-based offline preference optimization to learn balanced behavior by separating hard feasibility constraints from soft preferences, improving data efficiency.
Rather than producing a single fixed model, SAGE fine-tunes two expert models with opposing preferences and creates a continuous range of policies during inference via linear interpolation of their weights.
Experiments and theory (via linear mode connectivity) show SAGE can generate better-balanced scenarios and also supports more effective closed-loop training of driving policies.

Abstract

Adversarial scenario generation is a cost-effective approach for safety assessment of autonomous driving systems. However, existing methods are often constrained to a single, fixed trade-off between competing objectives such as adversariality and realism. This yields behavior-specific models that cannot be steered at inference time, lacking the efficiency and flexibility to generate tailored scenarios for diverse training and testing requirements. In view of this, we reframe the task of adversarial scenario generation as a multi-objective preference alignment problem and introduce a new framework named \textbf{S}teerable \textbf{A}dversarial scenario \textbf{GE}nerator (SAGE). SAGE enables fine-grained test-time control over the trade-off between adversariality and realism without any retraining. We first propose hierarchical group-based preference optimization, a data-efficient offline alignment method that learns to balance competing objectives by decoupling hard feasibility constraints from soft preferences. Instead of training a fixed model, SAGE fine-tunes two experts on opposing preferences and constructs a continuous spectrum of policies at inference time by linearly interpolating their weights. We provide theoretical justification for this framework through the lens of linear mode connectivity. Extensive experiments demonstrate that SAGE not only generates scenarios with a superior balance of adversariality and realism but also enables more effective closed-loop training of driving policies. Project page: https://tongnie.github.io/SAGE/.

Black Hat USA

AI Business

Top 10 Free AI Tools for Students in 2026: The Ultimate Study Guide

Dev.to

PaioClaw Review: What You Actually Get for $15/mo vs DIY OpenClaw

Dev.to

PaioClaw Review: What You Actually Get for $15/mo vs DIY OpenClaw

Dev.to

SIFS (SIFS Is Fast Search) - local code search for coding agents

Dev.to

Steerable Adversarial Scenario Generation through Test-Time Preference Alignment

Key Points

Abstract

Related Articles

Black Hat USA

Top 10 Free AI Tools for Students in 2026: The Ultimate Study Guide

PaioClaw Review: What You Actually Get for $15/mo vs DIY OpenClaw

PaioClaw Review: What You Actually Get for $15/mo vs DIY OpenClaw

SIFS (SIFS Is Fast Search) - local code search for coding agents

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer