SteeringDiffusion: A Bottlenecked Activation Control Interface for Diffusion Models

arXiv cs.CV / 5/5/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • SteeringDiffusion proposes a bottlenecked activation-level control interface for diffusion models, providing a smooth, monotonic, and runtime-adjustable knob for the content–style trade-off.
  • The approach freezes the U-Net backbone and learns only a small prompt-conditioned latent code that is projected to FiLM/AdaGN-style modulation parameters, with zero-initialization ensuring exact equivalence to the base model at zero control scale.
  • Timestep-aware gating limits where modulation is applied, restricting interventions to later denoising stages for more stable behavior.
  • At inference, a single scalar continuously traverses the learned control surface without retraining, and experiments on Stable Diffusion 1.5 and SDXL show improved controllability and stability versus LoRA under matched parameter budgets.
  • The paper also introduces a DDIM-inversion-based inversion-stability diagnostic that acts as a post-hoc probe, revealing strong correlations between inversion stability and the intervention magnitude.

Abstract

We introduce SteeringDiffusion, a bottlenecked activation-level control interface for diffusion models that exposes a smooth, monotonic, and runtime-adjustable control surface over the content--style trade-off. Our method keeps the U-Net backbone frozen and learns a small, prompt-conditioned latent code projected to FiLM/AdaGN-style modulation parameters. A zero-initialized design guarantees exact equivalence to the base model at zero scale, while timestep-aware gating restricts modulation to later denoising stages. A single scalar at inference continuously traverses the control surface without retraining. Across experiments on Stable Diffusion~1.5 and SDXL covering multiple artistic styles, we show that SteeringDiffusion produces smooth and monotonic content--style trade-offs. Under matched parameter budgets, it outperforms LoRA in controllability and stability, while ControlNet and rank-1 adapters do not expose a comparable control surface. We further introduce an inversion-stability diagnostic based on DDIM inversion, used as a post-hoc trajectory probe, which reveals strong correlations with intervention magnitude. These results position \emph{Steering Bottlenecked Explicit Control (S-BEC)} as a practical, general-purpose control interface for frozen diffusion backbones.