Beyond the Crop: Automating "Ghost Mannequin" Effects with Depth-Aware Inpainting

Dev.to / 4/20/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • Professional “ghost mannequin” e-commerce apparel photos traditionally require extensive manual clipping and compositing of separate model/garment images to achieve a realistic 3D “worn” look without a visible person.
  • Rewarx Studio AI proposes a generative-AI approach that focuses on the core difficulty of reconstructing occluded interior regions (e.g., the inside collar and sleeve opening curvature) while preserving consistent studio lighting.
  • The company’s pipeline uses SAM for semantic masking, Depth Anything for generating a depth map to guide shading and perceived 3D structure, and a fine-tuned SDXL inpainting model for context-aware reconstruction.
  • They emphasize prompt/latent guidance with technical descriptors tailored to “invisible interior” texture, including constraints to avoid artifacts like floating limbs or distorted seams.
  • The workflow reportedly cuts turnaround time from 20–30 minutes per image to under 15 seconds, enabling large-scale processing for catalogs with hundreds of SKUs.

**The Struggle of E-commerce Apparel 👕
**In professional apparel photography, the "Ghost Mannequin" (or hollow man) effect is the gold standard. It makes clothing look 3D and "worn" without a visible model. Traditionally, this requires hours of manual clipping and compositing two separate photos (one with the model, one with the garment inside-out).

At Rewarx Studio AI, we decided this was a perfect problem for Generative AI to solve—but it’s a lot harder than just hitting "Generate."

**The Technical Challenge: Depth & Occlusion
**The hardest part isn't removing the model; it's reconstructing what was behind them. Specifically:

*The inner back of the collar.
*

*The curvature of the sleeve openings.
*

Maintaining the lighting consistency inside the "hollow" areas.

**Our 3-Step Pipeline 🛠️
**To solve this, we moved away from generic inpainting and built a specialized pipeline:

**Semantic Masking (SAM): **We use the Segment Anything Model to precisely isolate the garment. But we don't just mask the model; we predict the "inner" bounds where the mannequin would logically end.

**Depth Estimation (Depth Anything): **To make the clothing look 3D and not like a flat sticker, we generate a depth map. This tells the AI: "This collar area is 5cm behind the front zipper," which guides the shading.

**Context-Aware Inpainting: **This is where the magic happens. We use a fine-tuned SDXL Inpainting model that understands apparel structure.

Let’s Talk Prompts (The Precision Part) 🔍
For AI to understand "invisible interior," generic terms fail. We inject technical descriptors into the latent space to guide the texture:

*Internal Prompt Logic:
*
(3D hollow effect:1.2), (inner garment texture:1.3), invisible mannequin, detailed fabric weave inside collar, consistent studio lighting, photorealistic, 8k, --no floating limbs, --no distorted seams.

**The Result
**We’ve managed to reduce a process that usually takes a senior retoucher 20-30 minutes per image down to under 15 seconds. For a brand with 500 SKUs, that’s a game-changer.

**What’s your take?
**I’m curious—has anyone else in the community experimented with combining Depth Maps and Inpainting for industrial use cases? I’d love to hear your thoughts on maintaining material texture during high-strength inpainting.

Cheers,
Keble
Founder @ Rewarx Studio AI