Em-Garde: A Propose-Match Framework for Proactive Streaming Video Understanding

arXiv cs.CV / 3/20/2026

📰 NewsSignals & Early TrendsModels & Research

共有:

Key Points

Em-Garde decouples semantic understanding from streaming perception to improve efficiency in proactive video understanding.
At query time, the Instruction-Guided Proposal Parser converts user queries into structured, perceptually grounded visual proposals.
During streaming, a Lightweight Proposal Matching Module performs embedding-based matching to trigger responses with reduced computation.
Experiments on StreamingBench and OVO-Bench show consistent improvements in proactive response accuracy and efficiency over prior models.
The work demonstrates a practical solution for proactive video understanding under strict computational constraints.

Abstract

Recent advances in Streaming Video Understanding has enabled a new interaction paradigm where models respond proactively to user queries. Current proactive VideoLLMs rely on per-frame triggering decision making, which suffers from an efficiency-accuracy dilemma. We propose Em-Garde, a novel framework that decouples semantic understanding from streaming perception. At query time, the Instruction-Guided Proposal Parser transforms user queries into structured, perceptually grounded visual proposals; during streaming, a Lightweight Proposal Matching Module performs efficient embedding-based matching to trigger responses. Experiments on StreamingBench and OVO-Bench demonstrate consistent improvements over prior models in proactive response accuracy and efficiency, validating an effective solution for proactive video understanding under strict computational constraints.

Composer 2: What is new and Compares with Claude Opus 4.6 & GPT-5.4

Dev.to

[D] Cathie wood claims ai productivity wave is starting, data shows 43% of ceos save 8+ hours weekly

Reddit r/MachineLearning

Microsoft hires top AI researchers from Allen Institute for AI for Suleyman's Superintelligence team

THE DECODER

MolmoWeb 4B/8B

Reddit r/LocalLLaMA

Malicious litellm_init.pth in litellm 1.82.8 — credential stealer

Simon Willison's Blog

Em-Garde: A Propose-Match Framework for Proactive Streaming Video Understanding

Key Points

Abstract

Related Articles

Composer 2: What is new and Compares with Claude Opus 4.6 & GPT-5.4

[D] Cathie wood claims ai productivity wave is starting, data shows 43% of ceos save 8+ hours weekly

Microsoft hires top AI researchers from Allen Institute for AI for Suleyman's Superintelligence team

MolmoWeb 4B/8B

Malicious litellm_init.pth in litellm 1.82.8 — credential stealer

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer