AdaSpark: Adaptive Sparsity for Efficient Long-Video Understanding

arXiv cs.CV / 4/10/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

AdaSpark proposes an adaptive sparsity framework to make Video-LLMs practical for long-form video by avoiding the high compute cost of dense processing.
The method partitions videos into 3D spatio-temporal cubes and uses co-designed, context-aware components (AdaS-Attn for cube selection and AdaS-FFN for token selection) to focus compute on what matters per query.
An entropy-based (Top-p) selection strategy dynamically allocates resources based on input complexity rather than relying on rigid sparse patterns.
Experiments report up to 57% FLOPs reduction while preserving comparable performance to dense models and maintaining fine-grained long-range temporal dependencies on hour-scale benchmarks.

Abstract

Processing long-form videos with Video Large Language Models (Video-LLMs) is computationally prohibitive. Current efficiency methods often compromise fine-grained perception through irreversible information disposal or inhibit long-range temporal modeling via rigid, predefined sparse patterns. This paper introduces AdaSpark, an adaptive sparsity framework designed to address these limitations. AdaSpark first partitions video inputs into 3D spatio-temporal cubes. It then employs two co-designed, context-aware components: (1) Adaptive Cube-Selective Attention (AdaS-Attn), which adaptively selects a subset of relevant video cubes to attend for each query token, and (2) Adaptive Token-Selective FFN (AdaS-FFN), which selectively processes only the most salient tokens within each cube. An entropy-based (Top-p) selection mechanism adaptively allocates computational resources based on input complexity. Experiments demonstrate that AdaSpark significantly reduces computational load by up to 57% FLOPs while maintaining comparable performance to dense models and preserving fine-grained, long-range dependencies, as validated on challenging hour-scale video benchmarks.

Black Hat Asia

AI Business

GLM 5.1 tops the code arena rankings for open models

Reddit r/LocalLLaMA

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

My Bestie Built a Free MCP Server for Job Search — Here's How It Works

Dev.to

can we talk about how AI has gotten really good at lying to you?

Reddit r/artificial

AdaSpark: Adaptive Sparsity for Efficient Long-Video Understanding

Key Points

Abstract

Related Articles

Black Hat Asia

GLM 5.1 tops the code arena rankings for open models

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

My Bestie Built a Free MCP Server for Job Search — Here's How It Works

can we talk about how AI has gotten really good at lying to you?

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer