$Z^2$-Sampling: Zero-Cost Zigzag Trajectories for Semantic Alignment in Diffusion Models

arXiv cs.CV / 4/28/2026

📰 NewsModels & Research

共有:

Key Points

The paper argues that standard Classifier-Free Guidance (CFG) misses the data manifold’s intrinsic curvature, motivating multi-step zigzag-style trajectories for better semantic alignment.
It proposes “Implicit Z-Sampling” to theoretically eliminate off-manifold truncation errors and reduce explicit zigzag evaluation costs.
Building on this, the authors introduce “$Z^2$-Sampling,” which uses temporal coherence from the Probability Flow ODE plus a cached Temporal Semantic Surrogate to return sampling efficiency to the standard 2-NFE baseline.
The method is theoretically analyzed with Backward Error Analysis, showing that the discrete collapse induces a directional-derivative curvature penalty that preserves semantic exploration.
Experiments indicate $Z^2$-Sampling breaks the performance–efficiency Pareto frontier and works broadly across different diffusion architectures (U-Nets, DiTs) and modalities (image/video), fitting with other alignment methods.

Abstract

Diffusion models have achieved unprecedented success in text-aligned generation, largely driven by Classifier-Free Guidance (CFG). However, standard CFG operates strictly on instantaneous gradients, omitting the intrinsic curvature of the data manifold. Recent methods like Zigzag-sampling (Z-Sampling) explicitly traverse multi-step forward-backward trajectories to probe this curvature, significantly improving semantic alignment. Yet, these explicit traversals triple the Neural Function Evaluation (NFE) cost and introduce unconstrained truncation errors from off-manifold evaluations, causing cumulative drift from the true marginal distribution. In this paper, we theoretically demonstrate that the explicit zigzag sequence is topologically reducible. We propose Implicit Z-Sampling, rigorously proving that intermediate states can be algebraically annihilated via operator dualities, physically eliminating off-manifold approximation errors. To push sampling efficiency to its theoretical lower bound, we introduce

Z^2

-Sampling (Zero-cost Zigzag Sampling). Exploiting the Probability Flow ODE's temporal coherence,

Z^2

-Sampling couples implicit algebraic collapse with a dynamically cached Temporal Semantic Surrogate. This restores the standard 2-NFE baseline without sacrificing semantic exploration. We formally prove via Backward Error Analysis that this discrete collapse inherently synthesizes a directional derivative curvature penalty. Finally, extensive evaluations demonstrate that

Z^2

-Sampling structurally shatters the performance-efficiency Pareto frontier. We validate its universal applicability across diverse architectures (U-Nets, DiTs) and modalities (image/video), establishing seamless orthogonality with advanced alignment frameworks (AYS, Diffusion-DPO).

Behind the Scenes of a Self-Evolving AI: The Architecture of Tian AI

Dev.to

Abliterlitics: Benchmarks and Tensor Comparison for Heretic, Abliterlix, Huiui, HauhauCS for GLM 4.7 Flash

Reddit r/LocalLLaMA

Record $1.1B Seed Funding for Reinforcement Learning Startup

AI Business

The One Substrate Failure Behind Every AI System in 2026

Reddit r/artificial

Into the Omniverse: Manufacturing’s Simulation-First Era Has Arrived

Nvidia AI Blog

$Z^2$-Sampling: Zero-Cost Zigzag Trajectories for Semantic Alignment in Diffusion Models

Key Points

Abstract

Related Articles

Behind the Scenes of a Self-Evolving AI: The Architecture of Tian AI

Abliterlitics: Benchmarks and Tensor Comparison for Heretic, Abliterlix, Huiui, HauhauCS for GLM 4.7 Flash

Record $1.1B Seed Funding for Reinforcement Learning Startup

The One Substrate Failure Behind Every AI System in 2026

Into the Omniverse: Manufacturing’s Simulation-First Era Has Arrived

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer