Continuous Diffusion Transformers for Designing Synthetic Regulatory Elements

arXiv cs.LG / 3/12/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

Introducing a parameter-efficient Diffusion Transformer (DiT) for generating 200bp cell-type-specific regulatory DNA sequences by replacing the U-Net backbone with a transformer denoiser and a 2D CNN input encoder.
The model matches the U-Net's best validation loss in 13 epochs with 60x fewer training iterations and converges 39% lower, while reducing memorization of training data from 5.3% to 1.7% as measured by BLAT.
Ablation studies show the CNN encoder is essential; without it, validation loss increases by about 70% regardless of positional embedding choice.
DDPO fine-tuning with Enformer as a reward model yields a 38x improvement in predicted regulatory activity, and cross-validation against DRAKES indicates the improvements reflect genuine regulatory signal rather than reward-model overfitting.

Abstract

We present a parameter-efficient Diffusion Transformer (DiT) for generating 200bp cell-type-specific regulatory DNA sequences. By replacing the U-Net backbone of DNA-Diffusion with a transformer denoiser equipped with a 2D CNN input encoder, our model matches the U-Net's best validation loss in 13 epochs (60

\times

fewer) and converges 39% lower, while reducing memorization from 5.3% to 1.7% of generated sequences aligning to training data via BLAT. Ablations show the CNN encoder is essential: without it, validation loss increases 70% regardless of positional embedding choice. We further apply DDPO finetuning using Enformer as a reward model, achieving a 38

\times

improvement in predicted regulatory activity. Cross-validation against DRAKES on an independent prediction task confirms that improvements reflect genuine regulatory signal rather than reward model overfitting.