Towards Design Compositing

arXiv cs.CV / 4/17/2026

📰 NewsTools & Practical UsageModels & Research

共有:

Key Points

The paper argues that modern graphic design generation often assumes the provided image/text/logo inputs are already stylistically harmonious, which breaks down when assets come from mismatched sources.
It proposes GIST, a training-free, identity-preserving image compositing module positioned between layout prediction and typography (design text) generation.
GIST can be plugged into existing components-to-design or design-refining pipelines without modifying them, aiming to improve harmony by stylizing/compositing inputs rather than keeping them unchanged.
Experiments integrating GIST with LaDeCo and Design-o-meter show improved visual harmony and aesthetic quality, validated by LLaVA-OV and GPT-4V using aspect-wise ratings and pairwise preferences versus naive pasting.

Abstract

Graphic design creation involves harmoniously assembling multimodal components such as images, text, logos, and other visual assets collected from diverse sources, into a visually-appealing and cohesive design. Recent methods have largely focused on layout prediction or complementary element generation, while retaining input elements exactly, implicitly assuming that provided components are already stylistically harmonious. In practice, inputs often come from disparate sources and exhibit visual mismatch, making this assumption limiting. We argue that identity-preserving stylization and compositing of input elements is a critical missing ingredient for truly harmonized components-to-design pipelines. To this end, we propose GIST, a training-free, identity-preserving image compositor that sits between layout prediction and typography generation, and can be plugged into any existing components-to-design or design-refining pipeline without modification. We demonstrate this by integrating GIST with two substantially different existing methods, LaDeCo and Design-o-meter. GIST shows significant improvements in visual harmony and aesthetic quality across both pipelines, as validated by LLaVA-OV and GPT-4V on aspect-wise ratings and pairwise preference over naive pasting. Project Page: abhinav-mahajan10.github.io/GIST/.