SimDiff: Depth Pruning via Similarity and Difference

arXiv cs.AI / 4/22/2026

📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces SimDiff, a new depth-pruning criterion for improving the inference efficiency of large language models by removing redundant layers.
Unlike prior approaches that rely mainly on layer-to-layer cosine similarity, SimDiff evaluates layers using two complementary, orthogonal signals: representational similarity and transformation difference.
It quantifies transformation difference with two metrics—MSSD (outlier-sensitive, emphasizing decisive corrections) and MASD (robust average contribution)—to avoid unpredictable or even catastrophic failures seen with single-heuristic methods.
Experiments across multiple models (0.5B–13B parameters) show SimDiff outperforms existing baselines across different pruning ratios, preserving over 91% of LLaMA2-7B performance at 25% pruning and enabling up to 1.49× inference speedup for LLaMA3.1-8B.
The authors report that heavily pruned models can be recovered effectively with minimal fine-tuning, suggesting practical deployability beyond one-shot pruning.

Abstract

Depth pruning improves the deployment efficiency of large language models (LLMs) by identifying and removing redundant layers. A widely accepted standard for this identification process is to measure the similarity between layers using cosine distance. However, we find that methods relying solely on this one-dimensional heuristic can exhibit unpredictable performance and even catastrophic collapse across different architectures. To address this issue, we propose SimDiff, a novel layer importance criterion that jointly evaluates layers from two orthogonal perspectives: representational similarity and transformation difference. The difference is quantified using two distinct metrics: MSSD, which is sensitive to outliers and identifies layers that make decisive corrections, and MASD, which robustly measures a layer's average contribution. Extensive experiments on multiple models ranging from 0.5B to 13B parameters demonstrate that SimDiff significantly outperforms state-of-the-art baselines across various pruning ratios. Notably, our method retains over 91% of LLaMA2-7B's performance at a 25% pruning ratio and achieves up to a 1.49x inference speedup when pruning 12 layers on LLaMA3.1-8B. We also show that pruned models can be effectively recovered with minimal fine-tuning.

Autoencoders and Representation Learning in Vision

Dev.to

Every AI finance app wants your data. I didn’t trust that — so I built my own. Offline.

Dev.to

Control Claude with Just a URL. The Chrome Extension "Send to Claude" Is Incredibly Useful

Dev.to

Google Stitch 2.0: Senior-Level UI in Seconds, But Editing Still Breaks

Dev.to

Now Meta will track what employees do on their computers to train its AI agents

The Verge

SimDiff: Depth Pruning via Similarity and Difference

Key Points

Abstract

Related Articles

Autoencoders and Representation Learning in Vision

Every AI finance app wants your data. I didn’t trust that — so I built my own. Offline.

Control Claude with Just a URL. The Chrome Extension "Send to Claude" Is Incredibly Useful

Google Stitch 2.0: Senior-Level UI in Seconds, But Editing Still Breaks

Now Meta will track what employees do on their computers to train its AI agents

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer