KiToke: Kernel-based Interval-aware Token Compression for Video Large Language Models

arXiv cs.CV / 4/7/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

KiTokeは、Video Large Language Modelsの推論コストを下げるために、学習なし（training-free）で映像トークンを圧縮する手法を提案している。
グローバルなトークン多様性をカーネルベースの冗長性指標で推定し、重要情報を保ったままコンテンツに応じてトークン選択を行う。
さらに、軽量な時間的インターバル構築とインターバルを考慮したトークン統合により、時間的な一貫性（temporal coherence）を維持する。
従来のローカル／セグメント単位のヒューリスティックに対して、動画全体のグローバル冗長性を明示的に扱う点が特徴で、極端なトークン予算（保持率1%まで）でも有効とされる。
複数の動画理解ベンチマークとVideo LLMバックボーンで、既存の学習なし圧縮手法より一貫して良い性能を示し、特に厳しい保持比率で大きな改善が報告されている。

Abstract

Video Large Language Models (Video LLMs) achieve strong performance on video understanding tasks but suffer from high inference costs due to the large number of visual tokens. We propose KiToke, a training-free, query-agnostic token compression approach that reduces spatiotemporal redundancy while preserving critical visual information. Our method estimates token diversity globally using a kernel-based redundancy measure, enabling content-adaptive selection that remains effective under extreme token budgets, and further introduces a lightweight temporal interval construction with interval-aware token merging to maintain temporal coherence. Unlike prior methods that rely on local or segment-level heuristics, KiToke explicitly captures global redundancy across an entire video, leading to more efficient token utilization. Extensive experiments on multiple video understanding benchmarks and Video LLM backbones demonstrate that KiToke consistently outperforms existing training-free compression methods, with particularly large gains at aggressive retention ratios down to 1%.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 4/7DailyView insight →

Black Hat Asia

AI Business

Research with ChatGPT

Dev.to

Silicon Valley is quietly running on Chinese open source models and almost nobody is talking about it

Reddit r/LocalLLaMA

Why AI Product Quality Is Now an Evaluation Pipeline Problem, Not a Model Problem

Dev.to

The 10 Best AI Tools for SEO and Digital Marketing in 2026

Dev.to

KiToke: Kernel-based Interval-aware Token Compression for Video Large Language Models

Key Points

Abstract

💡 Insights using this article

Related Articles

Black Hat Asia

Research with ChatGPT

Silicon Valley is quietly running on Chinese open source models and almost nobody is talking about it

Why AI Product Quality Is Now an Evaluation Pipeline Problem, Not a Model Problem

The 10 Best AI Tools for SEO and Digital Marketing in 2026

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer