Static and Dynamic Graph Alignment Network for Temporal Video Grounding
arXiv cs.CV / 5/4/2026
📰 NewsSignals & Early TrendsModels & Research
Key Points
- Temporal Video Grounding (TVG) seeks to match natural-language queries to the correct temporal segments within untrimmed videos, and recent GCN-based approaches build clip-level temporal graphs to improve reasoning.
- Existing GCN methods are limited by using only static or only dynamic node features, constructing temporal graphs in a query-agnostic way, and relying on single-granularity semantic matching that can slow convergence and hurt precision.
- The proposed Static and Dynamic Graph Alignment Network (SDGAN) builds two complementary temporal graphs using both static and dynamic visual features, then aligns nodes position-wise to form a richer representation.
- SDGAN adds query-clip contrastive learning and adaptive graph modeling to make the temporal graph explicitly query-aware, improving alignment between visual clips and textual queries.
- It further uses multi-granularity temporal proposals with a progressive easy-to-hard training strategy to connect coarse localization with fine boundary refinement, achieving better results on three benchmark datasets and releasing code/data on GitHub.
Related Articles

ALM on Power Platform: ADO + GitHub, the best of both worlds
Dev.to

Iron Will, Iron Problems: Kiwi-chan's Mining Misadventures! 🥝⛏️
Dev.to
Experiment: Does repeated usage influence ChatGPT 5.4 outputs in a RAG-like setup?
Dev.to
Open source models are going to be the future on Cursor, OpenCode etc.
Reddit r/LocalLLaMA
Claude Desktop + NFTs: MCP Tools for AI Agent NFT Management
Dev.to