Structural Pruning of Large Vision Language Models: A Comprehensive Study on Pruning Dynamics, Recovery, and Data Efficiency
arXiv cs.CL / 4/28/2026
📰 NewsTools & Practical UsageModels & Research
Key Points
- The study investigates compressing existing large vision-language models (LVLMs) via structured pruning on the language-model backbone, followed by lightweight recovery training.
- It compares layerwise vs. widthwise pruning and finds widthwise pruning generally preserves performance better in low-resource settings with limited compute or finetuning data.
- Recovery training is analyzed under data scarcity, showing that effective recovery is possible with only 5% of the original data while retaining over 95% of baseline performance.
- For small compression levels, finetuning only the multimodal projector is sufficient, and combining supervised finetuning with hidden-state distillation produces the best recovery across pruning strengths.
- Experiments across three representative LVLM families (3B–7B parameters) provide practical guidance for deploying LVLMs on edge devices without extensive computation or abundant data.
Related Articles

Black Hat USA
AI Business

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to
How to Build Traceable and Evaluated LLM Workflows Using Promptflow, Prompty, and OpenAI
MarkTechPost
AI 编程工具对比 2026:Claude Code vs Cursor vs Gemini CLI vs Codex
Dev.to

How I Improved My YouTube Shorts and Podcast Audio Workflow with AI Tools
Dev.to