Mind the Gap: Structure-Aware Consistency in Preference Learning

arXiv cs.LG / 5/1/2026

📰 NewsModels & Research

共有:

Key Points

The paper argues that common preference-learning approaches like DPO, which optimize surrogate losses instead of true pairwise ranking loss, can be theoretically inconsistent under typical neural-network equicontinuous hypothesis sets.
It proposes a margin-shifted ranking framework for LLM alignment and derives rigorous H-consistency bounds that depend on enforcing a separation margin γ.
The work introduces Structure-Aware H-consistency and a new objective called SA-DPO, which adapts the margin using semantic distance to better handle synonyms and difficult (hard) preference pairs.
It provides an analysis of the trade-off between consistency and model capacity using a Margin-Capacity Profile, concluding that heavy-tailed surrogates (e.g., the Polynomial Hinge family) can yield better consistency guarantees than the standard logistic loss in DPO.

Abstract

Preference learning has become the foundation of aligning Large Language Models (LLMs) with human intent. Popular methods, such as Direct Preference Optimization (DPO), minimize surrogate losses as proxies for the intractable pairwise ranking loss. However, we demonstrate that for the equicontinuous hypothesis sets typical of neural networks, these standard surrogates are theoretically inconsistent, yielding vacuous generalization guarantees. To resolve this, we formulate LLM alignment within a margin-shifted ranking framework. We derive rigorous

H

-consistency bounds that depend on enforcing a separation margin

\gamma

. Crucially, we extend this to Structure-Aware

H

-consistency, introducing a novel objective (SA-DPO) that adapts the margin based on the semantic distance between responses to handle synonyms and hard pairs. Finally, we analyze the trade-off between consistency and model limitations via the Margin-Capacity Profile, proving that heavy-tailed surrogates (such as the Polynomial Hinge family) offer superior consistency guarantees for capacity-bounded models compared to the standard logistic loss used in DPO.

Why Autonomous Coding Agents Keep Failing — And What Actually Works

Dev.to

Mistral's new flagship Medium 3.5 folds chat, reasoning, and code into one model

THE DECODER

Qualcomm teases ‘dedicated CPU for agentic experiences’ and ‘agentic smartphones’

The Register

Finetuning Dataset: Claude Opus 4.6/4.7 - 8.7k Chats

Reddit r/LocalLLaMA

Phosphene local video and audio generation for Apple Silicon open source (LTX 2.3) [P]

Reddit r/MachineLearning

Mind the Gap: Structure-Aware Consistency in Preference Learning

Key Points

Abstract

Related Articles

Why Autonomous Coding Agents Keep Failing — And What Actually Works

Mistral's new flagship Medium 3.5 folds chat, reasoning, and code into one model

Qualcomm teases ‘dedicated CPU for agentic experiences’ and ‘agentic smartphones’

Finetuning Dataset: Claude Opus 4.6/4.7 - 8.7k Chats

Phosphene local video and audio generation for Apple Silicon open source (LTX 2.3) [P]

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer