A^3: Towards Advertising Aesthetic Assessment

arXiv cs.CV / 3/26/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces A^3 (Advertising Aesthetic Assessment), a framework intended to make advertising image evaluation more scalable, standardized, and interpretable than subjective methods.
  • It proposes a theory-driven evaluation paradigm, A^3-Law, with three hierarchical stages: Perceptual Attention, Formal Interest, and Desire Impact to measure attention attraction, compositional interest, and persuasive desire effects.
  • The authors build A^3-Dataset (120K instruction-response pairs from 30K ads) with rich multi-dimensional annotations and Chain-of-Thought rationales to support multimodal instruction-following.
  • They train a multimodal large language model, A^3-Align, using CoT-guided learning aligned to A^3-Law, and benchmark it on A^3-Bench.
  • Results on A^3-Bench indicate A^3-Align better aligns with the A^3-Law rubric than prior models and generalizes to both ad quality selection and prescriptive critique.

Abstract

Advertising images significantly impact commercial conversion rates and brand equity, yet current evaluation methods rely on subjective judgments, lacking scalability, standardized criteria, and interpretability. To address these challenges, we present A^3 (Advertising Aesthetic Assessment), a comprehensive framework encompassing four components: a paradigm (A^3-Law), a dataset (A^3-Dataset), a multimodal large language model (A^3-Align), and a benchmark (A^3-Bench). Central to A^3 is a theory-driven paradigm, A^3-Law, comprising three hierarchical stages: (1) Perceptual Attention, evaluating perceptual image signals for their ability to attract attention; (2) Formal Interest, assessing formal composition of image color and spatial layout in evoking interest; and (3) Desire Impact, measuring desire evocation from images and their persuasive impact. Building on A^3-Law, we construct A^3-Dataset with 120K instruction-response pairs from 30K advertising images, each richly annotated with multi-dimensional labels and Chain-of-Thought (CoT) rationales. We further develop A^3-Align, trained under A^3-Law with CoT-guided learning on A^3-Dataset. Extensive experiments on A^3-Bench demonstrate that A^3-Align achieves superior alignment with A^3-Law compared to existing models, and this alignment generalizes well to quality advertisement selection and prescriptive advertisement critique, indicating its potential for broader deployment. Dataset, code, and models can be found at: https://github.com/euleryuan/A3-Align.

A^3: Towards Advertising Aesthetic Assessment | AI Navigate