A-SelecT: Automatic Timestep Selection for Diffusion Transformer Representation Learning

arXiv cs.AI / 3/30/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes A-SelecT, a method that automatically selects the most information-rich timestep for Diffusion Transformer (DiT) representation learning in a single run, addressing limitations from prior timestep-searching approaches.
  • A-SelecT is designed to remove the need for computationally expensive exhaustive timestep searching while also improving discriminative feature exploitation specific to DiT.
  • Experiments on classification and segmentation benchmarks show that DiT combined with A-SelecT outperforms previous diffusion-based approaches while maintaining improved training efficiency.
  • The work positions diffusion models—particularly DiT—as stronger candidates for discriminative representation learning via generative pre-training, beyond traditional U-Net-based diffusion architectures.

Abstract

Diffusion models have significantly reshaped the field of generative artificial intelligence and are now increasingly explored for their capacity in discriminative representation learning. Diffusion Transformer (DiT) has recently gained attention as a promising alternative to conventional U-Net-based diffusion models, demonstrating a promising avenue for downstream discriminative tasks via generative pre-training. However, its current training efficiency and representational capacity remain largely constrained due to the inadequate timestep searching and insufficient exploitation of DiT-specific feature representations. In light of this view, we introduce Automatically Selected Timestep (A-SelecT) that dynamically pinpoints DiT's most information-rich timestep from the selected transformer feature in a single run, eliminating the need for both computationally intensive exhaustive timestep searching and suboptimal discriminative feature selection. Extensive experiments on classification and segmentation benchmarks demonstrate that DiT, empowered by A-SelecT, surpasses all prior diffusion-based attempts efficiently and effectively.