Imagine Before Concentration: Diffusion-Guided Registers Enhance Partially Relevant Video Retrieval
arXiv cs.CV / 4/7/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces DreamPRVR, a coarse-to-fine framework for Partially Relevant Video Retrieval (PRVR) where text queries describe only partial events in untrimmed videos.
- It generates global contextual semantic “registers” as coarse-grained video highlights using a probabilistic variational sampler followed by iterative refinement with a text-supervised truncated diffusion model.
- The diffusion-based refinement is designed to build a well-formed textual latent space, improving robustness against query ambiguity and local noise from spurious matches.
- DreamPRVR then uses register-augmented Gaussian attention blocks to adaptively fuse these registers with video tokens for context-aware cross-modal matching.
- Experiments report improved performance over state-of-the-art PRVR methods and the authors provide released code for replication.



