Training-Free Object-Background Compositional T2I via Dynamic Spatial Guidance and Multi-Path Pruning

arXiv cs.CV / 4/14/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper identifies a common limitation in existing text-to-image diffusion models: a foreground bias that under-optimizes backgrounds, reducing global scene coherence and limiting compositional control.
It proposes a training-free sampling framework that explicitly models foreground–background interactions by restructuring diffusion inference rather than requiring model retraining.
Dynamic Spatial Guidance introduces a time-step-dependent gating mechanism to balance attention between foreground and background throughout the diffusion process.
Multi-Path Pruning uses multi-path latent exploration and dynamically filters candidate trajectories using attention statistics and external semantic alignment signals to better satisfy object–background constraints.
The authors introduce a benchmark for object–background compositionality and report consistent improvements across multiple diffusion backbones.

Abstract

Existing text-to-image diffusion models, while excelling at subject synthesis, exhibit a persistent foreground bias that treats the background as a passive and under-optimized byproduct. This imbalance compromises global scene coherence and constrains compositional control. To address the limitation, we propose a training-free framework that restructures diffusion sampling to explicitly account for foreground-background interactions. Our approach consists of two key components. First, Dynamic Spatial Guidance introduces a soft, time step dependent gating mechanism that modulates foreground and background attention during the diffusion process, enabling spatially balanced generation. Second, Multi-Path Pruning performs multi-path latent exploration and dynamically filters candidate trajectories using both internal attention statistics and external semantic alignment signals, retaining trajectories that better satisfy object-background constraints. We further develop a benchmark specifically designed to evaluate object-background compositionality. Extensive evaluations across multiple diffusion backbones demonstrate consistent improvements in background coherence and object-background compositional alignment.

Reported ban on ‘sex robots’ by online platform fuels debate on AI boundaries and content moderation

Reddit r/artificial

FastAPI With LangChain and MongoDB

Dev.to

Best AI Game Creator in 2026

Dev.to

Smart AI Recruiter Assistant with OpenClaw

Dev.to

🌱 Green Habit Tracker

Dev.to

Training-Free Object-Background Compositional T2I via Dynamic Spatial Guidance and Multi-Path Pruning

Key Points

Abstract

Related Articles

Reported ban on ‘sex robots’ by online platform fuels debate on AI boundaries and content moderation

FastAPI With LangChain and MongoDB

Best AI Game Creator in 2026

Smart AI Recruiter Assistant with OpenClaw

🌱 Green Habit Tracker

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer