Tail-Aware Information-Theoretic Generalization for RLHF and SGLD

arXiv stat.ML / 4/14/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces a tail-aware information-theoretic generalization framework for RLHF and stochastic optimization when losses/rewards are heavy-tailed and classical KL/MGF-based bounds fail due to non-existent moment generating functions.
It models tail heaviness using a sub-Weibull parameter \(\theta\), mapping \(\theta=2\) to sub-Gaussian, \(\theta=1\) to sub-exponential, and \(0<\theta<1\) to genuinely heavy-tailed regimes.
A core technical result is a decorrelation lemma that controls change-of-measure expectations via a shifted-log \(f_\theta\)-divergence, with explicit comparisons to Rényi divergence that avoid MGF arguments.
The authors develop maximal inequalities and Dudley/chaining bounds for sub-Weibull processes, yielding complexity scaling like \(\log^{1/\theta}\) and entropy\(^{1/\theta}\), and derive both expected and high-probability PAC-Bayes generalization guarantees.
The framework is applied to Rényi-regularized RLHF under heavy-tailed rewards and to SGLD with heavy-tailed gradient noise, demonstrating how the new tail-dependent bounds can characterize generalization behavior in realistic RL settings.

Abstract

Classical information-theoretic generalization bounds typically control the generalization gap through KL-based mutual information and therefore rely on boundedness or sub-Gaussian tails via the moment generating function (MGF). In many modern pipelines, such as robust learning, RLHF, and stochastic optimization, losses and rewards can be heavy-tailed, and MGFs may not exist, rendering KL-based tools ineffective. We develop a tail-dependent information-theoretic framework for sub-Weibull data, where the tail parameter

\theta

controls the tail heaviness:

\theta=2

corresponds to sub-Gaussian,

\theta=1

to sub-exponential, and

0<\theta<1

to genuinely heavy tails. Our key technical ingredient is a decorrelation lemma that bounds change-of-measure expectations using a shifted-log

f_\theta

-divergence, which admits explicit comparisons to R\'enyi divergence without MGF arguments. On the empirical-process side, we establish sharp maximal inequalities and a Dudley-type chaining bound for sub-Weibull processes with tail index

\theta

, with complexity scaling as

\log^{1/\theta}

and entropy

^{1/\theta}

. These tools yield expected and high-probability PAC-Bayes generalization bounds, as well as an information-theoretic chaining inequality based on multiscale R\'enyi mutual information. We illustrate the consequences in R\'enyi-regularized RLHF under heavy-tailed rewards and in stochastic gradient Langevin dynamics with heavy-tailed gradient noise.

FastAPI With LangChain and MongoDB

Dev.to

Best AI Game Creator in 2026

Dev.to

Smart AI Recruiter Assistant with OpenClaw

Dev.to

[2026] OpenTelemetry for LLM Observability — Self-Hosted Setup

Dev.to

Beyond Chatbots: Building Your AI-Coaching Engine

Dev.to

Tail-Aware Information-Theoretic Generalization for RLHF and SGLD

Key Points

Abstract

Related Articles

FastAPI With LangChain and MongoDB

Best AI Game Creator in 2026

Smart AI Recruiter Assistant with OpenClaw

[2026] OpenTelemetry for LLM Observability — Self-Hosted Setup

Beyond Chatbots: Building Your AI-Coaching Engine

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer