Online LLM watermark detection via e-processes

arXiv stat.ML / 4/13/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper presents a unified framework for detecting LLM watermarks by modeling watermark detection as hypothesis testing for independence between generated tokens and a pseudo-random sequence.
  • It introduces e-process–based online testing procedures that provide anytime-valid guarantees, enabling detection without needing to commit to a fixed-length test in advance.
  • The authors propose empirically adaptive ways to construct e-processes to improve detection power, including for broader sequential-testing settings with independent pivotal statistics.
  • The work includes theoretical characterizations of statistical power and reports experiments showing competitive performance versus existing watermark detection approaches.

Abstract

Watermarking for large language models (LLMs) has emerged as an effective tool for distinguishing AI-generated text from human-written content. Statistically, watermark schemes induce dependence between generated tokens and a pseudo-random sequence, reducing watermark detection to a hypothesis testing problem on independence. We develop a unified framework for LLM watermark detection based on e-processes, providing anytime-valid guarantees for online testing. We propose various methods to construct empirically adaptive e-processes that can enhance the detection power. The proposed methods are applicable to any sequential testing problem where independent pivotal statistics are available. In addition, theoretical results are established to characterize the power properties of the proposed procedures. Some experiments demonstrate that the proposed framework achieves competitive performance compared to existing watermark detection methods.