ART-VITON: Measurement-Guided Latent Diffusion for Artifact-Free Virtual Try-On

arXiv cs.AI / 4/15/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces ART-VITON, a measurement-guided latent diffusion approach for virtual try-on that aims to keep garment regions aligned while preserving identity and background in non-try-on areas.
  • It reformulates virtual try-on as a linear inverse problem and uses trajectory-aligned solvers to progressively enforce measurement consistency, reducing abrupt boundary transitions.
  • ART-VITON addresses semantic drift and boundary artifacts by combining residual prior-based initialization with artifact-free measurement-guided sampling steps (including data consistency, frequency-level correction, and periodic standard denoising).
  • Experiments on VITON-HD, DressCode, and SHHQ-1.0 show improved visual fidelity and robustness versus state-of-the-art baselines, with fewer boundary artifacts and better preservation of background and identity.

Abstract

Virtual try-on (VITON) aims to generate realistic images of a person wearing a target garment, requiring precise garment alignment in try-on regions and faithful preservation of identity and background in non-try-on regions. While latent diffusion models (LDMs) have advanced alignment and detail synthesis, preserving non-try-on regions remains challenging. A common post-hoc strategy directly replaces these regions with original content, but abrupt transitions often produce boundary artifacts. To overcome this, we reformulate VITON as a linear inverse problem and adopt trajectory-aligned solvers that progressively enforce measurement consistency, reducing abrupt changes in non-try-on regions. However, existing solvers still suffer from semantic drift during generation, leading to artifacts. We propose ART-VITON, a measurement-guided diffusion framework that ensures measurement adherence while maintaining artifact-free synthesis. Our method integrates residual prior-based initialization to mitigate training-inference mismatch and artifact-free measurement-guided sampling that combines data consistency, frequency-level correction, and periodic standard denoising. Experiments on VITON-HD, DressCode, and SHHQ-1.0 demonstrate that ART-VITON effectively preserves identity and background, eliminates boundary artifacts, and consistently improves visual fidelity and robustness over state-of-the-art baselines.