JND-Guided Neural Watermarking with Spatial Transformer Decoding for Screen-Capture Robustness
arXiv cs.CV / 3/31/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper proposes an end-to-end deep learning framework for screen-capture-robust neural watermarking that jointly optimizes watermark embedding and extraction under realistic camera/screen distortions.
- It introduces a noise simulation layer (including a physically motivated Moiré pattern generator) and adversarial training to improve robustness against coupled artifacts such as moiré, color-gamut shifts, perspective warping, and sensor noise.
- A JND (Just Noticeable Distortion) perceptual loss adaptively controls embedding strength by matching watermark residuals to a JND coefficient map, aiming to preserve visual quality.
- Two automatic localization components—foreground extraction via semantic segmentation and a symmetric noise-template mechanism for anti-cropping recovery—enable largely automated decoding in deployment-like conditions.
- Experiments report strong reconstruction/quality metrics (average PSNR ~30.94 dB, SSIM ~0.94) while embedding 127-bit payloads under the targeted screen-shooting channel.


