Benchmarking Layout-Guided Diffusion Models through Unified Semantic-Spatial Evaluation in Closed and Open Settings

arXiv cs.CV / 4/29/2026

📰 NewsModels & Research

共有:

Key Points

The paper addresses a key challenge in evaluating layout-guided text-to-image diffusion models: measuring both semantic alignment to prompts and spatial fidelity to layouts, which is hard due to costly fine-grained annotations.
It introduces two benchmarks: a closed-set C-Bench with controlled prompt/layout complexity and an open-set O-Bench using real-world prompts and layouts to test performance “in the wild.”
The authors propose a unified evaluation protocol that combines semantic and spatial accuracy into a single score to enable consistent and comparable model ranking.
They run a large-scale evaluation of six state-of-the-art layout-guided diffusion models, generating and evaluating 319,086 images, and publish an overall ranking plus detailed breakdowns for text vs. layout alignment.
Additional analyses examine how model strengths and weaknesses vary across scenarios and prompt complexities, and the accompanying code is released on GitHub.

Abstract

Evaluating layout-guided text-to-image generative models requires assessing both semantic alignment with textual prompts and spatial fidelity to prescribed layouts. Assessing layout alignment requires collecting fine-grained annotations, which is costly and labor-intensive. Consequently, current benchmarks rarely provide comprehensive layout evaluation and often remain limited in scale or coverage, making model comparison, ranking, and interpretation difficult. In this work, we introduce a closed-set benchmark (C-Bench) designed to isolate key generative capabilities while providing varying levels of complexity in both prompt structure and layout. To complement this controlled setting, we propose an open-set benchmark (O-Bench) that evaluates models using real-world prompts and layouts, offering a measure of semantic and spatial alignment in the wild. We further develop a unified evaluation protocol that combines semantic and spatial accuracy into a single score, ensuring consistent model ranking. Using our benchmarks, we conduct a large-scale evaluation of six state-of-the-art layout-guided diffusion models, totaling 319,086 generated and evaluated images. We establish a model ranking based on their overall performance and provide detailed breakdowns for text and layout alignment to enhance interpretability. Fine-grained analyses across scenarios and prompt complexities highlight the strengths and limitations of current models. Code is available at https://github.com/lparolari/cobench.

How I Use AI Agents to Maintain a Living Knowledge Base for My Team

Dev.to

IK_LLAMA now supports Qwen3.5 MTP Support :O

Reddit r/LocalLLaMA

OpenAI models, Codex, and Managed Agents come to AWS

Dev.to

Automatic Error Recovery in AI Agent Networks

Dev.to

AeroJAX: JAX-native CFD, differentiable end-to-end. ~560 FPS at 128x128 on CPU [P]

Reddit r/MachineLearning

Benchmarking Layout-Guided Diffusion Models through Unified Semantic-Spatial Evaluation in Closed and Open Settings

Key Points

Abstract

Related Articles

How I Use AI Agents to Maintain a Living Knowledge Base for My Team

IK_LLAMA now supports Qwen3.5 MTP Support :O

OpenAI models, Codex, and Managed Agents come to AWS

Automatic Error Recovery in AI Agent Networks

AeroJAX: JAX-native CFD, differentiable end-to-end. ~560 FPS at 128x128 on CPU [P]

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer