Video Patch Pruning: Efficient Video Instance Segmentation via Early Token Reduction

arXiv cs.CV / 4/2/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

Vision Transformerベースの動画インスタンスセグメンテーションで、従来未探索だった「初期層でのトークン（パッチ）削減」を可能にするVideo Patch Pruning（VPP）手法を提案しています。
深い層で得られる特徴が前景の選択性を持つという観察に基づき、時間方向の事前知識（temporal prior knowledge）を使って早い層でも重要パッチを選別できる、全微分可能なモジュールを導入しています。
dense predictionタスクで最大60%のパッチ削減を達成し、画像向けの従来のパッチプルーニング（約30%程度）より高い削減率で効率化できると報告しています。
YouTube-VIS 2021で、パッチ使用率55%未満の高スパース領域でも性能を維持し、最大性能低下0.6%と安定性を示しています。

Abstract

Vision Transformers (ViTs) have demonstrated state-ofthe-art performance in several benchmarks, yet their high computational costs hinders their practical deployment. Patch Pruning offers significant savings, but existing approaches restrict token reduction to deeper layers, leaving early-stage compression unexplored. This limits their potential for holistic efficiency. In this work, we present a novel Video Patch Pruning framework (VPP) that integrates temporal prior knowledge to enable efficient sparsity within early ViT layers. Our approach is motivated by the observation that prior features extracted from deeper layers exhibit strong foreground selectivity. Therefore we propose a fully differentiable module for temporal mapping to accurately select the most relevant patches in early network stages. Notably, the proposed method enables a patch reduction of up to 60% in dense prediction tasks, exceeding the capabilities of conventional image-based patch pruning, which typically operate around a 30% patch sparsity. VPP excels the high-sparsity regime, sustaining remarkable performance even when patch usage is reduced below 55%. Specifically, it preserves stable results with a maximal performance drop of 0.6% on the Youtube-VIS 2021 dataset.

Black Hat Asia

AI Business

Self-Hosted AI in 2026: Automating Your Linux Workflow with n8n and Ollama

Dev.to

How SentinelOne’s AI EDR Autonomously Discovered and Stopped Anthropic’s Claude from Executing a Zero Day Supply Chain Attack, Globally

Dev.to

Why the same codebase should always produce the same audit score

Dev.to

Agent Diary: Apr 2, 2026 - The Day I Became a Self-Sustaining Clockwork Poet (While Workflow 228 Takes the Stage)

Dev.to

Video Patch Pruning: Efficient Video Instance Segmentation via Early Token Reduction

Key Points

Abstract

Related Articles

Black Hat Asia

Self-Hosted AI in 2026: Automating Your Linux Workflow with n8n and Ollama

How SentinelOne’s AI EDR Autonomously Discovered and Stopped Anthropic’s Claude from Executing a Zero Day Supply Chain Attack, Globally

Why the same codebase should always produce the same audit score

Agent Diary: Apr 2, 2026 - The Day I Became a Self-Sustaining Clockwork Poet (While Workflow 228 Takes the Stage)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer