ASTRA: Enhancing Multi-Subject Generation with Retrieval-Augmented Pose Guidance and Disentangled Position Embedding
arXiv cs.CV / 4/16/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper addresses multi-subject, subject-driven image generation where models often fuse identities and distort poses when different subjects act in complex, distinct ways.
- It proposes ASTRA, a framework that disentangles subject appearance from pose structure within a unified Diffusion Transformer by combining retrieval-augmented pose guidance with specialized positional encoding.
- ASTRA uses a Retrieval-Augmented Pose (RAG-Pose) pipeline to supply an explicit structural prior, reducing entanglement between appearance and pose signals.
- The method introduces Enhanced Universal Rotary Position Embedding (EURoPE) to decouple identity tokens from spatial locations while tying pose tokens to the image canvas, and a Disentangled Semantic Modulation (DSM) adapter to preserve identity via the text conditioning stream.
- Experiments report state-of-the-art pose adherence on a COCO-based complex-pose benchmark while maintaining high identity fidelity and text alignment on DreamBench.
Related Articles

Black Hat Asia
AI Business

Introducing Claude Opus 4.7
Anthropic News

AI traffic to US retailers rose 393% in Q1, and it’s boosting their revenue too
TechCrunch

Who Audits the Auditors? Building an LLM-as-a-Judge for Agentic Reliability
Dev.to

"Enterprise AI Cost Optimization: How Companies Are Cutting AI Infrastructure Sp
Dev.to