SEAL: Semantic-aware Single-image Sticker Personalization with a Large-scale Sticker-tag Dataset

arXiv cs.CV / 4/30/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

Key Points

  • The paper introduces SEAL, a plug-and-play, diffusion-based personalization adaptation module designed to personalize stickers from a single reference image without modifying the underlying U-Net diffusion backbone.
  • SEAL targets two common single-image test-time fine-tuning failures—visual entanglement (background absorbed into the learned concept) and structural rigidity (over-memorizing reference spatial layouts)—by adding semantic/spatial and structural constraints during embedding adaptation.
  • The method uses three components during embedding adaptation: a Semantic-guided Spatial Attention Loss, a Split-merge Token Strategy, and Structure-aware Layer Restriction.
  • To enable attribute-level control and systematic evaluation, the authors release StickerBench, a large-scale sticker dataset with structured tags across six attributes (Appearance, Emotion, Action, Camera Composition, Style, Background).
  • Experiments indicate SEAL improves identity preservation while maintaining contextual controllability, and the authors state that code and the dataset will be publicly released.

Abstract

Synthesizing a target concept from a single reference image is challenging in diffusion-based personalized text-to-image generation, particularly for sticker personalization where prompts often require explicit attribute edits. With only one reference, test-time fine-tuning (TTF) methods tend to overfit, producing \textit{visual entanglement}, where background artifacts are absorbed into the learned concept, and \textit{structural rigidity}, where the model memorizes reference-specific spatial configurations and loses contextual controllability. To address these issues, we introduce \textbf{SE}mantic-aware single-image sticker person\textbf{AL}ization (\textbf{SEAL}), a plug-and-play, architecture-agnostic adaptation module that integrates into existing personalization pipelines without modifying their U-Net-based diffusion backbones. SEAL applies three components during embedding adaptation: (1) a Semantic-guided Spatial Attention Loss, (2) a Split-merge Token Strategy, and (3) Structure-aware Layer Restriction. To support sticker-domain personalization with attribute-level control, we present StickerBench, a large-scale sticker image dataset with structured tags under a six-attribute schema (Appearance, Emotion, Action, Camera Composition, Style, Background). These annotations provide a consistent interface for varying context while keeping target identity fixed, enabling systematic evaluation of identity disentanglement and contextual controllability. Experiments show that SEAL consistently improves identity preservation while maintaining contextual controllability, highlighting the importance of explicit spatial and structural constraints during test-time adaptation. The code, StickerBench, and project page will be publicly released.

SEAL: Semantic-aware Single-image Sticker Personalization with a Large-scale Sticker-tag Dataset | AI Navigate