Steerable Adversarial Scenario Generation through Test-Time Preference Alignment

arXiv cs.RO / 5/6/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • The paper reframes adversarial scenario generation for autonomous driving safety as a multi-objective preference alignment problem, addressing the limitation of existing methods that rely on a single fixed trade-off between adversariality and realism.
  • It introduces SAGE (Steerable Adversarial scenario GEnerator), which allows fine-grained control of the adversariality–realism balance at test time without any retraining.
  • SAGE uses hierarchical group-based offline preference optimization to learn balanced behavior by separating hard feasibility constraints from soft preferences, improving data efficiency.
  • Rather than producing a single fixed model, SAGE fine-tunes two expert models with opposing preferences and creates a continuous range of policies during inference via linear interpolation of their weights.
  • Experiments and theory (via linear mode connectivity) show SAGE can generate better-balanced scenarios and also supports more effective closed-loop training of driving policies.

Abstract

Adversarial scenario generation is a cost-effective approach for safety assessment of autonomous driving systems. However, existing methods are often constrained to a single, fixed trade-off between competing objectives such as adversariality and realism. This yields behavior-specific models that cannot be steered at inference time, lacking the efficiency and flexibility to generate tailored scenarios for diverse training and testing requirements. In view of this, we reframe the task of adversarial scenario generation as a multi-objective preference alignment problem and introduce a new framework named \textbf{S}teerable \textbf{A}dversarial scenario \textbf{GE}nerator (SAGE). SAGE enables fine-grained test-time control over the trade-off between adversariality and realism without any retraining. We first propose hierarchical group-based preference optimization, a data-efficient offline alignment method that learns to balance competing objectives by decoupling hard feasibility constraints from soft preferences. Instead of training a fixed model, SAGE fine-tunes two experts on opposing preferences and constructs a continuous spectrum of policies at inference time by linearly interpolating their weights. We provide theoretical justification for this framework through the lens of linear mode connectivity. Extensive experiments demonstrate that SAGE not only generates scenarios with a superior balance of adversariality and realism but also enables more effective closed-loop training of driving policies. Project page: https://tongnie.github.io/SAGE/.