AI Navigate

TRACE: Structure-Aware Character Encoding for Robust and Generalizable Document Watermarking

arXiv cs.CV / 3/16/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • TRACE proposes a structure-aware diffusion-based framework for robust and generalizable document watermarking by encoding data at the character level.
  • It comprises adaptive diffusion initialization that uses movement probability estimator (MPE), target point estimation (TPE), and mask drawing model (MDM) to identify handle points, target points, and editing regions.
  • It employs guided diffusion encoding to move the selected points and masked region replacement with a specialized loss to minimize feature alterations after diffusion.
  • Experimental results show more than 5 dB PSNR improvement and about 5% higher extraction accuracy after cross-media transmission, outperforming state-of-the-art methods.
  • The approach generalizes across multiple languages and fonts, increasing practicality for real-world document security applications.

Abstract

We propose TRACE, a structure-aware framework leveraging diffusion models for localized character encoding to embed data. Unlike existing methods that rely on edge features or pre-defined codebooks, TRACE exploits character structures that provide inherent resistance to noise interference due to their stability and unified representation across diverse characters. Our framework comprises three key components: (1) adaptive diffusion initialization that automatically identifies handle points, target points, and editing regions through specialized algorithms including movement probability estimator (MPE), target point estimation (TPE) and mask drawing model (MDM), (2) guided diffusion encoding for precise movement of selected point, and (3) masked region replacement with a specialized loss function to minimize feature alterations after the diffusion process. Comprehensive experiments demonstrate \name{}'s superior performance over state-of-the-art methods, achieving more than 5 dB improvement in PSNR and 5\% higher extraction accuracy following cross-media transmission. \name{} achieves broad generalizability across multiple languages and fonts, making it particularly suitable for practical document security applications.