Stay in your Lane: Role Specific Queries with Overlap Suppression Loss for Dense Video Captioning
arXiv cs.CV / 3/13/2026
📰 NewsModels & Research
Key Points
- Introduces role-specific queries to decouple localization and captioning in dense video captioning, reducing cross-task interference.
- Adds a suppression mechanism that penalizes mutual temporal overlaps across queries to learn non-overlapping, more precise event regions.
- Applies contrastive alignment to ensure semantic consistency between the separated localization and captioning outputs.
- Proposes a lightweight core-concept module to enrich captions with concept-level representations for improved semantic richness.
- Validates the approach on major DVC benchmarks YouCook2 and ActivityNet Captions, showing effective performance gains.
Related Articles
Two bots, one confused server: what Nimbus revealed about AI agent identity
Dev.to
PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark forFinance
Dev.to
A Coding Implementation to Build an Uncertainty-Aware LLM System with Confidence Estimation, Self-Evaluation, and Automatic Web Research
MarkTechPost
DNA Memory: Making AI Agents Learn, Forget, and Evolve Like a Human Brain
Dev.to
Tinybox- offline AI device 120B parameters
Hacker News