Latent Space Probing for Adult Content Detection in Video Generative Models

arXiv cs.CV / 5/5/2026

📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper addresses a gap in current adult-content moderation for AI video generators by moving detection from prompt or pixel-space to the model’s internal latent representations.
It proposes a latent space probing framework that intercepts denoised latents from the CogVideoX diffusion model during inference and adds lightweight classifiers for real-time detection.
The authors build a large binary dataset of 11,039 ten-second clips (5,086 violating and 5,953 non-violating) sourced from adult websites and YouTube to train and evaluate the approach.
Two lightweight probing classifier architectures are introduced, achieving 97.29% F1 on a held-out test set with an added inference overhead of about 4–6 ms.
The results indicate that latent-space signals can improve both detection accuracy and operational cost versus methods limited to prompts or decoded pixels.

Abstract

The rapid proliferation of AI-powered video generation systems has introduced significant challenges in content moderation, particularly with respect to adult and sexually explicit material. Existing detection methods operate on either prompts or decoded pixel-space outputs. Therefore, both approaches are blind to the rich internal representations formed during generation. In this paper, we propose a novel latent space probing framework that intercepts the denoised latent representations produced by the CogVideoX video diffusion model during inference and attaches lightweight classifiers to perform real-time adult content detection. To support this work, we construct a large-scale binary dataset of 11039 ten-second video clips (5086 violating, 5953 non-violating) sourced from adult websites and YouTube respectively. We introduce two lightweight probing classifier architectures. We train and evaluate it on the dataset. Our work demonstrates that latent-space signals encode strong discriminative features for harmful content detection, achieving 97.29% F1 on our held-out test set with an overhead in the 4-6ms range. Our results suggest that probing the latent space results in improvements in both detection performance as well as cost.