RCP: Representation Consistency Pruner for Mitigating Distribution Shift in Large Vision-Language Models
arXiv cs.CV / 4/8/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper identifies a key problem with existing visual-token pruning for large vision-language models: irreversibly removing tokens shifts hidden-state distributions and causes substantial accuracy loss.
- It proposes RCP (Representation Consistency Pruner), which uses a cross-attention pruner to generate cumulative, monotonic masks that consistently reduce visual tokens across LLM layers.
- To mitigate information loss from pruning, RCP adds a delayed repair adapter (DRA) that caches pruned-token “essence” and applies FiLM-style modulation to answer-generation tokens.
- Training uses a repair loss to match first- and second-order statistics between pruned representations and a full-token teacher, while inference remains efficient via physical token discarding.
- Experiments on LVLM benchmarks show up to 88.9% visual-token removal and up to 85.7% FLOPs reduction with only a marginal average accuracy drop, outperforming prior pruning approaches that avoid fine-tuning.
Related Articles
CIA is trusting AI to help analyze intel from human spies
Reddit r/artificial

LLM API Pricing in 2026: I Put Every Major Model in One Table
Dev.to

i generated AI video on a GTX 1660. here's what it actually takes.
Dev.to
Meta-Optimized Continual Adaptation for planetary geology survey missions for extreme data sparsity scenarios
Dev.to

How To Optimize Enterprise AI Energy Consumption
Dev.to