PosterIQ: A Design Perspective Benchmark for Poster Understanding and Generation

arXiv cs.CV / 3/26/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • PosterIQ is introduced as a design-driven benchmark for poster understanding and generation, annotated with composition structure, typographic hierarchy, and semantic intent across real/professional/synthetic examples.
  • The dataset contains 7,765 image-annotation instances and 822 generation prompts, with tasks covering layout parsing, text-image correspondence, typography/readability and font perception, design-quality assessment, and controllable composition-aware generation (including metaphor).
  • Evaluations of state-of-the-art MLLMs and diffusion-based generators reveal persistent gaps in visual hierarchy, typographic semantics, saliency control, and accurate intention communication.
  • Results suggest commercial MLLMs excel in higher-level reasoning but function as insensitive automatic raters, while diffusion generators can render text well yet struggle with composition-aware synthesis.
  • The authors position PosterIQ as both a quantitative benchmark and a diagnostic tool to test and improve design reasoning in vision-language and generative systems using reproducible, task-specific metrics.

Abstract

We present PosterIQ, a design-driven benchmark for poster understanding and generation, annotated across composition structure, typographic hierarchy, and semantic intent. It includes 7,765 image-annotation instances and 822 generation prompts spanning real, professional, and synthetic cases. To bridge visual design cognition and generative modeling, we define tasks for layout parsing, text-image correspondence, typography/readability and font perception, design quality assessment, and controllable, composition-aware generation with metaphor. We evaluate state-of-the-art MLLMs and diffusion-based generators, finding persistent gaps in visual hierarchy, typographic semantics, saliency control, and intention communication; commercial models lead on high-level reasoning but act as insensitive automatic raters, while generators render text well yet struggle with composition-aware synthesis. Extensive analyses show PosterIQ is both a quantitative benchmark and a diagnostic tool for design reasoning, offering reproducible, task-specific metrics. We aim to catalyze models' creativity and integrate human-centred design principles into generative vision-language systems.