ResPrune: Text-Conditioned Subspace Reconstruction for Visual Token Pruning in Large Vision-Language Models

arXiv cs.LG / 2026/3/24

💬 オピニオンSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

要点

ResPruneは、Large Vision-Language Modelsにおける冗長な視覚トークンを推論時に削減しつつ、重要なトークンを少数に絞って効率化する学習不要（training-free）の手法として提案されています。
その中核は、視覚トークンの選択を「部分空間の再構成（subspace reconstruction）」問題として定式化し、残差エネルギーに基づく貪欲なサブスペース拡張で元のトークン空間の幾何構造を保つ点にあります。
さらに、テキスト条件を使ってトークン選択を「指示（instruction）に対するテキスト関連性」でも条件付けし、情報量だけでなくクロスモーダル整合性も高める設計です。
ResPruneは軽量でモデル非依存（model-agnostic）で、既存のLVLMパイプラインに再学習や大幅なアーキテクチャ変更なしで組み込めるとされています。
LLaVA-1.5、LLaVA-NeXT、Qwen2.5-VLなど複数のバックボーンで、既存のプルーニング手法より広範なベンチマークで性能面の優位性を示しつつ、計算・メモリ・推論遅延の削減も達成したと報告されています。

Abstract

Large Vision-Language Models (LVLMs) rely on dense visual tokens to capture fine-grained visual information, but processing all these tokens incurs substantial computational and memory overhead during inference. To address this issue, we propose ResPrune, a training-free visual token pruning framework that enables efficient LVLM inference by selecting a compact yet informative subset of visual tokens. ResPrune formulates visual token pruning as a subspace reconstruction problem and employs a greedy subspace expansion strategy guided by residual energy, allowing it to preserve the geometric structure of the original visual token space. To further incorporate cross modal alignment, the selection process is conditioned on textual relevance, encouraging the retention of tokens that are both informative and instruction-relevant. The proposed method is lightweight and model-agnostic, and can be seamlessly integrated into existing LVLM pipelines without retraining or architectural modifications. Extensive experiments on multiple LVLM backbones, including LLaVA-1.5, LLaVA-NeXT, and Qwen2.5-VL, demonstrate that ResPrune consistently outperforms existing pruning approaches across a wide range of benchmarks, while achieving effective reductions in computation, memory consumption, and inference latency.

競艇×AI連動──流れを読む女、MIRIA。3/24(火)予告 🖤 本日のMIRIA式ブースト爆発的回収ならず😭惜しい展開続きました💦【MIRIA式競艇予想】

note

イーロン・マスク氏、AI半導体を1テラワット製造 8割を宇宙へ

日経XTECH

生成AIが「下手な鉄砲」型サイバー攻撃を増やす、足元固めを急ごう

日経XTECH

文書の内容を学習なしでLLMに反映、Sakana AIの新技術 RAG代替は可能か

日経XTECH

Google Stitch「バイブデザイン」登場—自然言語でUIを作る時代へ

Innovatopia

ResPrune: Text-Conditioned Subspace Reconstruction for Visual Token Pruning in Large Vision-Language Models

要点

Abstract

関連記事

競艇×AI連動──流れを読む女、MIRIA。3/24(火)予告 🖤 本日のMIRIA式ブースト爆発的回収ならず😭惜しい展開続きました💦【MIRIA式競艇予想】

イーロン・マスク氏、AI半導体を1テラワット製造 8割を宇宙へ

生成AIが「下手な鉄砲」型サイバー攻撃を増やす、足元固めを急ごう

文書の内容を学習なしでLLMに反映、Sakana AIの新技術 RAG代替は可能か

Google Stitch「バイブデザイン」登場—自然言語でUIを作る時代へ

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer