PROMO: Promptable Outfitting for Efficient High-Fidelity Virtual Try-On
arXiv cs.CV / 3/13/2026
📰 NewsTools & Practical UsageModels & Research
Key Points
- PROMO is a promptable virtual try-on framework built on a Flow Matching DiT backbone with latent multi-modal conditioning to support high-fidelity VTON results including subject preservation, texture transfer, and harmonization.
- It leverages conditioning efficiency and self-reference mechanisms to substantially reduce inference overhead compared with prior VTON methods.
- On standard benchmarks, PROMO surpasses prior VTON methods and general image-editing models in visual fidelity while maintaining a competitive balance between quality and speed.
- The training framework is generic and transferable to broader image-editing tasks, with VTON-paired data providing rich supervision for training general-purpose editors.
- The work highlights that flow-matching transformers with latent conditioning and self-reference acceleration offer an effective, training-efficient solution for high-quality virtual try-on with potential impact on online retail.
Related Articles
State of MCP Security 2026: We Scanned 15,923 AI Tools. Here's What We Found.
Dev.to
I Built a Zombie Process Killer Because Claude Code Ate 14GB of My RAM
Dev.to
Data Augmentation Using GANs
Dev.to
Building Safety Guardrails for LLM Customer Service That Actually Work in Production
Dev.to

The Digital Paralegal: Amplifying Legal Teams with a Copilot Co-Worker
Dev.to