Sharpness-Aware Minimization in Logit Space Efficiently Enhances Direct Preference Optimization

arXiv cs.LG / 3/20/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical UsageModels & Research

共有:

Key Points

The paper identifies a squeezing effect in Direct Preference Optimization (DPO) where the probability of preferred responses declines during training due to high-curvature directions in logit space and negative-gradient updates.
It develops a theoretical framework that models coordinate-wise dynamics in logit space to explain how residuals expand along high-curvature directions, which underlie the squeezing phenomenon.
The authors demonstrate that Sharpness-Aware Minimization (SAM) can suppress this behavior via curvature-regularization and introduce logits-SAM, a computationally efficient variant that perturbs only the output layer.
Experiments on Pythia-2.8B, Mistral-7B, and Gemma-2B-IT show that logits-SAM consistently improves the effectiveness of DPO and integrates with existing DPO variants, with code available on GitHub.

Abstract

Direct Preference Optimization (DPO) has emerged as a popular algorithm for aligning pretrained large language models with human preferences, owing to its simplicity and training stability. However, DPO suffers from the recently identified squeezing effect (also known as likelihood displacement), where the probability of preferred responses decreases unintentionally during training. To understand and mitigate this phenomenon, we develop a theoretical framework that models the coordinate-wise dynamics in logit space. Our analysis reveals that negative-gradient updates cause residuals to expand rapidly along high-curvature directions, which underlies the squeezing effect, whereas Sharpness-Aware Minimization (SAM) can suppress this behavior through its curvature-regularization effect. Building on this insight, we investigate logits-SAM, a computationally efficient variant that perturbs only the output layer with negligible overhead. Extensive experiments on Pythia-2.8B, Mistral-7B, and Gemma-2B-IT across multiple datasets and benchmarks demonstrate that logits-SAM consistently improves the effectiveness of DPO and integrates seamlessly with other DPO variants. Code is available at https://github.com/RitianLuo/logits-sam-dpo.

ベテランの若手育成負担を減らせ、PLC制御の「ラダー図」をAIで生成

日経XTECH

Run Claude Opus 4.6 via OpenAI-compatible API using your existing Pro/Max subscription

Dev.to

Jupyter AI Extension - Multi-LLM Support

Dev.to

How to Build an AI Team: The Solopreneur Playbook

Dev.to

Getting Started with AI Agents

Dev.to

Sharpness-Aware Minimization in Logit Space Efficiently Enhances Direct Preference Optimization

Key Points

Abstract

Related Articles

ベテランの若手育成負担を減らせ、PLC制御の「ラダー図」をAIで生成

Run Claude Opus 4.6 via OpenAI-compatible API using your existing Pro/Max subscription

Jupyter AI Extension - Multi-LLM Support

How to Build an AI Team: The Solopreneur Playbook

Getting Started with AI Agents

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer