Topology-Aware Layer Pruning for Large Vision-Language Models

arXiv cs.CV / 4/21/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper introduces a topology-aware layer pruning framework for large vision-language models (LVLMs) to reduce computational and memory costs for deployment in resource-constrained settings.
  • It models the evolution of layer-wise hidden states as point clouds and uses simplicial complexes with zigzag persistent homology to measure inter-layer topological consistency.
  • The approach enables adaptive pruning that aims to keep transition-critical layers, addressing a key weakness of prior pruning methods that rely on local similarity or static proxy signals.
  • Experiments across multiple multimodal benchmarks show the method outperforms existing pruning baselines across a broad range of sparsity ratios.
  • The authors provide an open-source implementation at the linked GitHub repository.

Abstract

Large Language Models (LLMs) have demonstrated strong capabilities in natural language understanding and reasoning, while recent extensions that incorporate visual inputs enable them to process multimodal information. Despite these advances, Large Vision-Language Models (LVLMs) incur substantial computational and memory costs, hindering deployment in resource-constrained scenarios. Existing layer pruning methods typically rely on local similarity metrics or static proxy signals, failing to capture the global and dynamic evolution of representations across model depth, which often leads to the removal of transition-critical layers. To address this limitation, we propose a topology-aware layer pruning framework for LVLMs. Specifically, we represent layer wise hidden states as point clouds and models their evolution using \textit{simplicial complexes}. By leveraging \textit{zigzag persistent homology}, we quantify inter-layer topological consistency and enable adaptive pruning that preserves critical representational transitions. Extensive experiments on diverse multimodal benchmarks demonstrate that the proposed framework consistently outperforms existing pruning methods across a wide range of sparsity ratios. Our code is available at https://github.com/zpc456/TopoVLM.