Color When It Counts: Grayscale-Guided Online Triggering for Always-On Streaming Video Sensing

arXiv cs.CV / 3/25/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes a “grayscale-always, color-on-demand” paradigm to reduce the cost of always-on streaming video sensing on edge and wearable devices.
  • It argues that continuous grayscale streams preserve temporal structure well enough that only sparse RGB frames are needed for near-baseline streaming video understanding performance.
  • The method, ColorTrigger, is an online, training-free trigger that selectively turns on RGB capture using windowed grayscale affinity analysis.
  • ColorTrigger detects chromatic redundancy causally with lightweight quadratic programming and manages sensing/inference tradeoffs via credit-budgeted control and dynamic token routing.
  • On streaming video understanding benchmarks, it reaches 91.6% of a full-color baseline while using just 8.1% of RGB frames, suggesting substantial color redundancy in natural videos.

Abstract

Always-on sensing is essential for next-generation edge/wearable AI systems, yet continuous high-fidelity RGB video capture remains prohibitively expensive for resource-constrained mobile and edge platforms. We present a new paradigm for efficient streaming video understanding: grayscale-always, color-on-demand. Through preliminary studies, we discover that color is not always necessary. Sparse RGB frames suffice for comparable performance when temporal structure is preserved via continuous grayscale streams. Building on this insight, we propose ColorTrigger, an online training-free trigger that selectively activates color capture based on windowed grayscale affinity analysis. Designed for real-time edge deployment, ColorTrigger uses lightweight quadratic programming to detect chromatic redundancy causally, coupled with credit-budgeted control and dynamic token routing to jointly reduce sensing and inference costs. On streaming video understanding benchmarks, ColorTrigger achieves 91.6% of full-color baseline performance while using only 8.1% RGB frames, demonstrating substantial color redundancy in natural videos and enabling practical always-on video sensing on resource-constrained devices.