Self-Adversarial One Step Generation via Condition Shifting

arXiv cs.CV / 4/15/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces APEX, a discriminator-free method for improving one-step text-to-image generation by deriving adversarial-style correction signals internally from a flow model via condition shifting.
APEX uses a shifted-condition branch whose velocity field acts as an independent estimator of the model’s current distribution, yielding a gradient that is theoretically GAN-aligned and avoids sample-dependent discriminator terms that can cause gradient vanishing.
The approach targets the common one-step tradeoffs—fidelity, inference speed, and training efficiency—by remaining stable and avoiding the instability and GPU/memory overhead often seen with external discriminator-based methods.
Empirical results show strong one-step quality: a 0.6B APEX model reportedly surpasses FLUX-Schnell 12B, and LoRA tuning on Qwen-Image 20B reaches a GenEval of 0.89 at NFE=1 in ~6 hours while matching/exceeding a 50-step teacher.
The framework is described as architecture-preserving and plug-and-play, supporting both full fine-tuning and parameter-efficient LoRA tuning, with code released on GitHub.

Abstract

The push for efficient text to image synthesis has moved the field toward one step sampling, yet existing methods still face a three way tradeoff among fidelity, inference speed, and training efficiency. Approaches that rely on external discriminators can sharpen one step performance, but they often introduce training instability, high GPU memory overhead, and slow convergence, which complicates scaling and parameter efficient tuning. In contrast, regression based distillation and consistency objectives are easier to optimize, but they typically lose fine details when constrained to a single step. We present APEX, built on a key theoretical insight: adversarial correction signals can be extracted endogenously from a flow model through condition shifting. Using a transformation creates a shifted condition branch whose velocity field serves as an independent estimator of the model's current generation distribution, yielding a gradient that is provably GAN aligned, replacing the sample dependent discriminator terms that cause gradient vanishing. This discriminator free design is architecture preserving, making APEX a plug and play framework compatible with both full parameter and LoRA based tuning. Empirically, our 0.6B model surpasses FLUX-Schnell 12B (20

\times

more parameters) in one step quality. With LoRA tuning on Qwen-Image 20B, APEX reaches a GenEval score of 0.89 at NFE=1 in 6 hours, surpassing the original 50-step teacher (0.87) and providing a 15.33

\times

inference speedup. Code is available https://github.com/LINs-lab/APEX.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 4/15DailyView insight →

Black Hat Asia

AI Business

The Complete Guide to Better Meeting Productivity with AI Note-Taking

Dev.to

5 Ways Real-Time AI Can Boost Your Sales Call Performance

Dev.to

RAG in Practice — Part 4: Chunking, Retrieval, and the Decisions That Break RAG

Dev.to

Why dynamically routing multi-timescale advantages in PPO causes policy collapse (and a simple decoupled fix) [R]

Reddit r/MachineLearning

Self-Adversarial One Step Generation via Condition Shifting

Key Points

Abstract

💡 Insights using this article

Related Articles

Black Hat Asia

The Complete Guide to Better Meeting Productivity with AI Note-Taking

5 Ways Real-Time AI Can Boost Your Sales Call Performance

RAG in Practice — Part 4: Chunking, Retrieval, and the Decisions That Break RAG

Why dynamically routing multi-timescale advantages in PPO causes policy collapse (and a simple decoupled fix) [R]

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer