UniMark: Unified Adaptive Multi-bit Watermarking for Autoregressive Image Generators

arXiv cs.CV / 4/15/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces UniMark, a training-free unified watermarking framework designed for autoregressive image generators to protect ownership and enable tracing of AI-generated images.
  • UniMark addresses prior limitations by supporting multi-bit (not just zero-bit) messages, using Adaptive Semantic Grouping (ASG) for secret-key-driven codebook partitioning that improves security, and employing Block-wise Multi-bit Encoding (BME) with error-correcting codes for reliable extraction.
  • It includes a Unified Token-Replacement Interface (UTRI) to generalize watermark embedding across different autoregressive paradigms, such as next-token and next-scale prediction models.
  • The authors provide theoretical analysis of detection error rates and embedding capacity, and report state-of-the-art results across image quality (FID), watermark detection accuracy, and multi-bit message extraction.
  • Experiments show robustness to common real-world degradations and attacks including cropping, JPEG compression, Gaussian noise, blur, color jitter, and random erasing.

Abstract

Invisible watermarking for autoregressive (AR) image generation has recently gained attention as a means of protecting image ownership and tracing AI-generated content. However, existing approaches suffer from three key limitations: (1) they embed only zero-bit watermarks for binary verification, lacking the ability to convey multi-bit messages; (2) they rely on static codebook partitioning strategies that are vulnerable to security attacks once the partition is exposed; and (3) they are designed for specific AR architectures, failing to generalize across diverse AR paradigms. We propose \method{}, a training-free, unified watermarking framework for autoregressive image generators that addresses all three limitations. \method{} introduces three core components: \textbf{Adaptive Semantic Grouping (ASG)}, which dynamically partitions codebook entries based on semantic similarity and a secret key, ensuring both image quality preservation and security; \textbf{Block-wise Multi-bit Encoding (BME)}, which divides the token sequence into blocks and encodes different bits across blocks with error-correcting codes for reliable message transmission; and \textbf{a Unified Token-Replacement Interface (UTRI)} that abstracts the watermark embedding process to support both next-token prediction (e.g., LlamaGen) and next-scale prediction (e.g., VAR) paradigms. We provide theoretical analysis on detection error rates and embedding capacity. Extensive experiments on three AR models demonstrate that \method{} achieves state-of-the-art performance in image quality (FID), watermark detection accuracy, and multi-bit message extraction, while maintaining robustness against cropping, JPEG compression, Gaussian noise, blur, color jitter, and random erasing attacks.