A Layer-wise Analysis of Supervised Fine-Tuning

arXiv cs.AI / 4/15/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies how Supervised Fine-Tuning (SFT) produces instruction-following behavior while mitigating risks like catastrophic forgetting, focusing on mechanisms at the layer level across 1B–32B parameter scales.
  • Experiments find a depth-dependent stability pattern: middle layers (about 20%–80%) remain stable, while final layers are significantly more sensitive to tuning.
  • Based on this, the authors propose Mid-Block Efficient Tuning, which selectively updates only the critical intermediate layers rather than applying uniform adaptation across the network.
  • The proposed method achieves stronger results than standard LoRA, including up to a 10.2% improvement on GSM8K for OLMo2-7B, while using reduced parameter overhead.
  • The authors report that alignment effects are more architecturally localized than fully distributed, and they provide public code for reproducibility.

Abstract

While critical for alignment, Supervised Fine-Tuning (SFT) incurs the risk of catastrophic forgetting, yet the layer-wise emergence of instruction-following capabilities remains elusive. We investigate this mechanism via a comprehensive analysis utilizing information-theoretic, geometric, and optimization metrics across model scales (1B-32B). Our experiments reveal a distinct depth-dependent pattern: middle layers (20\%-80\%) are stable, whereas final layers exhibit high sensitivity. Leveraging this insight, we propose Mid-Block Efficient Tuning, which selectively updates these critical intermediate layers. Empirically, our method outperforms standard LoRA up to 10.2\% on GSM8K (OLMo2-7B) with reduced parameter overhead, demonstrating that effective alignment is architecturally localized rather than distributed. The code is publicly available at https://anonymous.4open.science/r/base_sft.