Towards Better Statistical Understanding of Watermarking LLMs

arXiv stat.ML / 4/8/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

This paper analyzes the trade-off in LLM watermarking between model distortion and detector effectiveness by formulating it as a constrained optimization problem around red-green list watermarking.
It derives an analytical property of the optimal solution and uses it to motivate a new online dual gradient ascent watermarking algorithm.
The authors prove the proposed method achieves asymptotic Pareto optimality, providing explicit guarantees that improve detection ability via increased green-list probability while controlling distortion.
They compare and justify watermark distortion metrics, arguing for KL divergence and highlighting problems with prior “distortion-free” and perplexity-based criteria.
Experiments on extensive datasets show the algorithm performs against benchmark watermarking approaches, supporting the practical value of the theoretical formulation.

Abstract

In this paper, we study the problem of watermarking large language models (LLMs). We consider the trade-off between model distortion and detection ability and formulate it as a constrained optimization problem based on the red-green list watermarking algorithm. We show that the optimal solution to the optimization problem enjoys a nice analytical property which provides a better understanding and inspires the algorithm design for the watermarking process. We develop an online dual gradient ascent watermarking algorithm in light of this optimization formulation and prove its asymptotic Pareto optimality between model distortion and detection ability. Such a result guarantees an averaged increased green list probability and henceforth detection ability explicitly (in contrast to previous results). Moreover, we provide a systematic discussion on the choice of the model distortion metrics for the watermarking problem. We justify our choice of KL divergence and present issues with the existing criteria of ``distortion-free'' and perplexity. Finally, we empirically evaluate our algorithms on extensive datasets against benchmark algorithms.

Black Hat Asia

AI Business

Meta's latest model is as open as Zuckerberg's private school

The Register

AI fuels global trade growth as China-US flows shift, McKinsey finds

SCMP Tech

Why multi-agent AI security is broken (and the identity patterns that actually work)

Dev.to

BANKING77-77: New best of 94.61% on the official test set (+0.13pp) over our previous tests 94.48%.

Reddit r/artificial

Towards Better Statistical Understanding of Watermarking LLMs

Key Points

Abstract

Related Articles

Black Hat Asia

Meta's latest model is as open as Zuckerberg's private school

AI fuels global trade growth as China-US flows shift, McKinsey finds

Why multi-agent AI security is broken (and the identity patterns that actually work)

BANKING77-77: New best of 94.61% on the official test set (+0.13pp) over our previous tests 94.48%.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer