SwiftPie: Lightning-fast Subject-driven Image Personalization via One step Diffusion

arXiv cs.CV / 5/5/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Key Points

  • SwiftPie is introduced as the first one-step diffusion approach for subject-driven image personalization, targeting real-time interactive use where prior methods were too slow or computationally heavy.
  • The method uses a novel dual-branch identity injection mechanism to integrate subject identity into a one-step diffusion pipeline effectively.
  • It further improves subject contextualization within a single denoising step by applying a mask-guided rescaling strategy.
  • Experiments show SwiftPie achieves faster personalized image generation while maintaining performance comparable to multi-step methods in both identity fidelity and prompt alignment.
  • The work suggests new opportunities for high-quality personalized image synthesis in interactive visual applications by reducing inference time.

Abstract

Diffusion models have achieved remarkable success in high-quality image synthesis, sparking interest in image-guided generation tasks such as subject-driven image personalization. Despite their impressive personalization results, existing methods typically rely on computationally intensive fine-tuning, iterative optimization, or multi-step denoising processes, which significantly hinder their deployment and interactive capability in real-time applications. In this work, we present SwiftPie, the first one-step diffusion image personalization tool that enables lightning-fast generation of personalized images. SwiftPie introduces a novel dual-branch identity injection mechanism that effectively integrates subject identity into a one-step diffusion model. In addition, we incorporate a mask-guided rescaling strategy to further enhance subject contextualization within a single diffusion step. Extensive experiments demonstrate that SwiftPie not only delivers superior image personalization speed but also achieves comparable performance with multi-step approaches in both identity fidelity and prompt alignment. This work opens new opportunities for real-time, high-quality personalized image generation, paving the way for interactive visual synthesis.