AttentionBender: Manipulating Cross-Attention in Video Diffusion Transformers as a Creative Probe

arXiv cs.CV / 4/24/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • The paper introduces AttentionBender, a tool that manipulates cross-attention in Video Diffusion Transformers to let artists explore how black-box video generation actually works.
  • Because prompt-only control is limited, the authors use a “research-through-design” approach building on Network Bending to apply 2D transforms to cross-attention maps, modulating what the model generates.
  • Experiments visualize 4,500+ video generations while varying prompts, attention-map operations, and target layers to evaluate controllability.
  • The findings indicate cross-attention is strongly entangled, meaning targeted edits often don’t stay localized and instead create distributed distortions and glitch-like aesthetics rather than clean, direct changes.
  • AttentionBender is positioned both as an Explainable AI-style probe of transformer attention mechanisms and as a creative method to generate aesthetics outside the model’s default learned representational space.

Abstract

We present AttentionBender, a tool that manipulates cross-attention in Video Diffusion Transformers to help artists probe the internal mechanics of black-box video generation. While generative outputs are increasingly realistic, prompt-only control limits artists' ability to build intuition for the model's material process or to work beyond its default tendencies. Using an autobiographical research-through-design approach, we built on Network Bending to design AttentionBender, which applies 2D transforms (rotation, scaling, translation, etc.) to cross-attention maps to modulate generation. We assess AttentionBender by visualizing 4,500+ video generations across prompts, operations, and layer targets. Our results suggest that cross-attention is highly entangled: targeted manipulations often resist clean, localized control, producing distributed distortions and glitch aesthetics over linear edits. AttentionBender contributes a tool that functions both as an Explainable AI style probe of transformer attention mechanisms, and as a creative technique for producing novel aesthetics beyond the model's learned representational space.