TripVVT: A Large-Scale Triplet Dataset and a Coarse-Mask Baseline for In-the-Wild Video Virtual Try-On
arXiv cs.CV / 5/1/2026
📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research
Key Points
- The paper addresses limitations in video virtual try-on caused by scarce large-scale in-the-wild triplet data and unreliable mask usage.
- It introduces TripVVT-10K, a large and diverse in-the-wild triplet dataset with explicit video-level cross-garment supervision.
- Building on this dataset, the authors propose TripVVT, a Diffusion Transformer-based framework that replaces fragile garment masks with a stable human-mask prior to better preserve backgrounds under real-world motion and occlusion.
- For evaluation, they release TripVVT-Bench, a 100-case benchmark with varied garments, environments, and multi-person scenes, assessing quality, try-on fidelity, background consistency, and temporal coherence.
- Experiments show TripVVT improves video quality and garment fidelity while improving generalization to challenging in-the-wild videos, and the dataset/benchmark are publicly released.
Related Articles

Black Hat USA
AI Business

Why Autonomous Coding Agents Keep Failing — And What Actually Works
Dev.to

Text-to-image is easy. Chaining LLMs to generate, critique, and iterate on images autonomously is a routing nightmare. AgentSwarms now supports Image generation playground and creative media workflows!
Reddit r/artificial

Announcing the NVIDIA Nemotron 3 Super Build Contest
Dev.to

75% of Sites Blocking AI Bots Still Get Cited. Here Is Why Blocking Does Not Work.
Dev.to