PROMO: Promptable Outfitting for Efficient High-Fidelity Virtual Try-On
arXiv cs.CV / 3/13/2026
📰 NewsTools & Practical UsageModels & Research
Key Points
- PROMO is a promptable virtual try-on framework built on a Flow Matching DiT backbone with latent multi-modal conditioning to support high-fidelity VTON results including subject preservation, texture transfer, and harmonization.
- It leverages conditioning efficiency and self-reference mechanisms to substantially reduce inference overhead compared with prior VTON methods.
- On standard benchmarks, PROMO surpasses prior VTON methods and general image-editing models in visual fidelity while maintaining a competitive balance between quality and speed.
- The training framework is generic and transferable to broader image-editing tasks, with VTON-paired data providing rich supervision for training general-purpose editors.
- The work highlights that flow-matching transformers with latent conditioning and self-reference acceleration offer an effective, training-efficient solution for high-quality virtual try-on with potential impact on online retail.




