Revisiting Sharpness-Aware Minimization: A More Faithful and Effective Implementation

arXiv cs.AI / 3/12/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper analyzes why the standard practical implementation of Sharpness-Aware Minimization (SAM) works and introduces eXplicit Sharpness-Aware Minimization (XSAM) to address its limitations for single-step and multi-step ascent.
It shows that the gradient at the ascent point, when applied to the current parameters, better approximates the direction toward the local maximum within the neighborhood than the local gradient alone.
XSAM explicitly estimates the ascent direction to improve the approximation and designs a search space that effectively leverages gradient information from multi-step ascent, with negligible additional computational cost.
The approach provides a unified formulation applicable to both single-step and multi-step settings and demonstrates consistent improvements over existing SAM variants in experiments.
Extensive experiments indicate XSAM offers superior generalization performance with only modest computational overhead compared to prior methods.

Abstract

Sharpness-Aware Minimization (SAM) enhances generalization by minimizing the maximum training loss within a predefined neighborhood around the parameters. However, its practical implementation approximates this as gradient ascent(s) followed by applying the gradient at the ascent point to update the current parameters. This practice can be justified as approximately optimizing the objective by neglecting the (full) derivative of the ascent point with respect to the current parameters. Nevertheless, a direct and intuitive understanding of why using the gradient at the ascent point to update the current parameters works superiorly is still lacking. Our work bridges this gap by proposing a novel and intuitive interpretation. We show that the gradient at the single-step ascent point, \uline{when applied to the current parameters}, provides a better approximation of the direction from the current parameters toward the maximum within the local neighborhood than the local gradient. This improved approximation thereby enables a more direct escape from the maximum within the local neighborhood. Nevertheless, our analysis further reveals two issues. First, the approximation by the gradient at the single-step ascent point is often inaccurate. Second, the approximation quality may degrade as the number of ascent steps increases. To address these limitations, we propose in this paper eXplicit Sharpness-Aware Minimization (XSAM). It tackles the first by explicitly estimating the direction of the maximum during training, while addressing the second by crafting a search space that effectively leverages the gradient information at the multi-step ascent point. XSAM features a unified formulation that applies to both single-step and multi-step settings and only incurs negligible computational overhead. Extensive experiments demonstrate the consistent superiority of XSAM against existing counterparts.

🚀 Resume Feedback Is Easy — Until You Try Making It Context-Aware

Dev.to

The Open-Source Voice AI Stack Every Developer Should Know in 2026

Dev.to

15 Best Lightweight Language Models Worth Running in 2026

Dev.to

[M] LILA-E8, LILA-Leech: The Geometric Intelligence Manifesto. Why Sam Altman’s "Parameter Golf" is already over.

Dev.to

Agent Diagnostics Mode — A Structured Technique for Iterative Prompt Tuning

Dev.to

Revisiting Sharpness-Aware Minimization: A More Faithful and Effective Implementation

Key Points

Abstract

Related Articles

🚀 Resume Feedback Is Easy — Until You Try Making It Context-Aware

The Open-Source Voice AI Stack Every Developer Should Know in 2026

15 Best Lightweight Language Models Worth Running in 2026

[M] LILA-E8, LILA-Leech: The Geometric Intelligence Manifesto. Why Sam Altman’s "Parameter Golf" is already over.

Agent Diagnostics Mode — A Structured Technique for Iterative Prompt Tuning

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer