SteeringDiffusion: A Bottlenecked Activation Control Interface for Diffusion Models

arXiv cs.CV / 5/5/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

SteeringDiffusion proposes a bottlenecked activation-level control interface for diffusion models, providing a smooth, monotonic, and runtime-adjustable knob for the content–style trade-off.
The approach freezes the U-Net backbone and learns only a small prompt-conditioned latent code that is projected to FiLM/AdaGN-style modulation parameters, with zero-initialization ensuring exact equivalence to the base model at zero control scale.
Timestep-aware gating limits where modulation is applied, restricting interventions to later denoising stages for more stable behavior.
At inference, a single scalar continuously traverses the learned control surface without retraining, and experiments on Stable Diffusion 1.5 and SDXL show improved controllability and stability versus LoRA under matched parameter budgets.
The paper also introduces a DDIM-inversion-based inversion-stability diagnostic that acts as a post-hoc probe, revealing strong correlations between inversion stability and the intervention magnitude.

Abstract

We introduce SteeringDiffusion, a bottlenecked activation-level control interface for diffusion models that exposes a smooth, monotonic, and runtime-adjustable control surface over the content--style trade-off. Our method keeps the U-Net backbone frozen and learns a small, prompt-conditioned latent code projected to FiLM/AdaGN-style modulation parameters. A zero-initialized design guarantees exact equivalence to the base model at zero scale, while timestep-aware gating restricts modulation to later denoising stages. A single scalar at inference continuously traverses the control surface without retraining. Across experiments on Stable Diffusion~1.5 and SDXL covering multiple artistic styles, we show that SteeringDiffusion produces smooth and monotonic content--style trade-offs. Under matched parameter budgets, it outperforms LoRA in controllability and stability, while ControlNet and rank-1 adapters do not expose a comparable control surface. We further introduce an inversion-stability diagnostic based on DDIM inversion, used as a post-hoc trajectory probe, which reveals strong correlations with intervention magnitude. These results position \emph{Steering Bottlenecked Explicit Control (S-BEC)} as a practical, general-purpose control interface for frozen diffusion backbones.