Collaborative Multi-Mode Pruning for Vision-Language Models
arXiv cs.CV / 4/6/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper proposes Collaborative Multi-Mode Pruning (CoMP) to compress vision-language models more effectively on resource-constrained devices by jointly pruning both parameters and tokens rather than using a single pruning mode.
- It introduces a Collaborative Importance Metric (CIM) that models the mutual interference between parameters and tokens, aiming to improve parameter importance estimation without harming token importance scoring when components are removed.
- It develops a Multi-Mode Pruning Strategy (MPS) that breaks pruning into stages and adaptively shifts among pruning modes based on estimated pruning costs, historical cost, and random exploration to avoid unstable behavior and local optima.
- Experiments across multiple vision-language tasks and models show CoMP maintains stronger performance under high pruning ratios compared with state-of-the-art single-mode approaches.
- The authors provide an open-source implementation of CoMP via a public GitHub repository.
Related Articles

Black Hat Asia
AI Business
How Bash Command Safety Analysis Works in AI Systems
Dev.to
How I Built an AI Agent That Earns USDC While I Sleep — A Complete Guide
Dev.to
How to Get Better Output from AI Tools (Without Burning Time and Tokens)
Dev.to
How I Added LangChain4j Without Letting It Take Over My Spring Boot App
Dev.to