Exponential families from a single KL identity

arXiv cs.LG / 5/1/2026

📰 NewsModels & Research

共有:

Key Points

The paper presents a single key KL-divergence identity for exponential families that relates KL differences to the log-partition function A(λ) and the moment μ_q.
By combining this identity with only the nonnegativity of KL divergence, the authors derive multiple classical results (e.g., generalized three-point identity, Pythagorean theorems for I-projections).
The derivations also recover key structural properties of exponential families, including convexity of A(λ), its Legendre dual expressed via KL, and the Gibbs variational principle.
The note further shows how the same framework yields optimization formulas relevant to KL-regularized reward maximization, including the exponential tilting identity used in entropy-regularized control and RLHF.
Additional analytic consequences include the gradient formula for A(λ), a Bregman representation for within-family KL, and surjectivity of the moment map.

Abstract

Exponential families encompass the distributions central to modern machine learning -- softmax, Gaussians, and Boltzmann distributions -- and underlie the theory of variational inference, entropy-regularized reinforcement learning, and RLHF. We isolate a simple identity for exponential families that expresses the KL difference

\mathrm{KL}(q \| p_{\lambda_2}) - \mathrm{KL}(q \| p_{\lambda_1})

in terms of the log-partition function

A(\lambda)

and the moment

\mu_q

. Remarkably, this identity together with the single fact that

\mathrm{KL} \geq 0

(with equality iff

p = q

) suffices, by direct substitution and rearrangement, to derive a cluster of results that are classically obtained by separate, heavier arguments: a generalized three-point identity for arbitrary reference distributions, Pythagorean theorems for I-projections and reverse I-projections, convexity of the log-partition function, identification of its Legendre dual in KL terms, the Gibbs variational principle, and the explicit optimizer in KL-regularized reward maximization, including the exponential tilting formula underlying entropy-regularized control and RLHF. Beyond these purely algebraic consequences, standard analytic arguments recover the gradient formula for the log-partition function, the Bregman representation of within-family KL divergence, and the surjectivity of the moment map. The note is self-contained.

Why Autonomous Coding Agents Keep Failing — And What Actually Works

Dev.to

Mistral's new flagship Medium 3.5 folds chat, reasoning, and code into one model

THE DECODER

Qualcomm teases ‘dedicated CPU for agentic experiences’ and ‘agentic smartphones’

The Register

Finetuning Dataset: Claude Opus 4.6/4.7 - 8.7k Chats

Reddit r/LocalLLaMA

Phosphene local video and audio generation for Apple Silicon open source (LTX 2.3) [P]

Reddit r/MachineLearning

Exponential families from a single KL identity

Key Points

Abstract

Related Articles

Why Autonomous Coding Agents Keep Failing — And What Actually Works

Mistral's new flagship Medium 3.5 folds chat, reasoning, and code into one model

Qualcomm teases ‘dedicated CPU for agentic experiences’ and ‘agentic smartphones’

Finetuning Dataset: Claude Opus 4.6/4.7 - 8.7k Chats

Phosphene local video and audio generation for Apple Silicon open source (LTX 2.3) [P]

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer