Em-Garde: A Propose-Match Framework for Proactive Streaming Video Understanding
arXiv cs.CV / 3/20/2026
📰 NewsSignals & Early TrendsModels & Research
Key Points
- Em-Garde decouples semantic understanding from streaming perception to improve efficiency in proactive video understanding.
- At query time, the Instruction-Guided Proposal Parser converts user queries into structured, perceptually grounded visual proposals.
- During streaming, a Lightweight Proposal Matching Module performs embedding-based matching to trigger responses with reduced computation.
- Experiments on StreamingBench and OVO-Bench show consistent improvements in proactive response accuracy and efficiency over prior models.
- The work demonstrates a practical solution for proactive video understanding under strict computational constraints.
Related Articles

Composer 2: What is new and Compares with Claude Opus 4.6 & GPT-5.4
Dev.to
[D] Cathie wood claims ai productivity wave is starting, data shows 43% of ceos save 8+ hours weekly
Reddit r/MachineLearning

Microsoft hires top AI researchers from Allen Institute for AI for Suleyman's Superintelligence team
THE DECODER
MolmoWeb 4B/8B
Reddit r/LocalLLaMA
Malicious litellm_init.pth in litellm 1.82.8 — credential stealer
Simon Willison's Blog