AttnRouter: Per-Category Attention Routing for Training-Free Image Editing on MMDiT
arXiv cs.CV / 5/5/2026
📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research
Key Points
- The paper studies training-free image editing on Qwen-Image-Edit-2511, a 60-block multi-modal diffusion transformer (MMDiT) that uses a single attention stream mixing noise and source-image tokens.
- It introduces KVInject, a single-forward KV (key/value) injection method that alpha-blends source-half key/value projections into the noise-half within a localized layer/step band, improving results versus the prior MasaCtrl approach while avoiding prompt-mismatch failures.
- The authors find that different edit types require different attention operations, leading to AttnRouter, a per-category routing table that dispatches edits to the attention manipulation best preserving source structure for each category.
- Using ground-truth edit categories, AttnRouter boosts a CLIP-T+DINO-I composite score by 6.4% over a baseline, and an automatic CLIP zero-shot classifier recovers 98% of the gain despite only 55% category accuracy.
- Ablations localize the effective attention sub-circuit: early denoising-step K/V injection (S0–7) nearly matches full-step gains, while other layer/step bands and naive K/V rescaling fail, and the authors release code, routing tables, and benchmark subsets.
Related Articles

Black Hat USA
AI Business

Why Retail Chargeback Recovery Could Be AgentHansa's First Real PMF
Dev.to

Anthropic Launches AI Services Company with Blackstone & Goldman Sachs
Dev.to

10 Ways AI Has Become Your Invisible Daily Companion in 2026
Dev.to

My ‘Busy’ Button Is a Chat Window: 8 Hours of Sorting & Broccoli Poetry
Dev.to