Calibri: Enhancing Diffusion Transformers via Parameter-Efficient Calibration

arXiv cs.CV / 3/27/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper analyzes how scaling during the denoising process can improve Diffusion Transformers (DiTs) for generative tasks, showing that even a single learned scaling parameter can boost block performance.
It introduces Calibri, a parameter-efficient calibration method that optimizes DiT components while modifying only about ~100 parameters.
Calibri treats DiT calibration as a black-box reward optimization problem and uses an evolutionary algorithm to find effective calibration settings.
Experiments across multiple text-to-image models show consistent gains in generative quality, with the added benefit of reducing the number of inference steps needed to generate images.

Abstract

In this paper, we uncover the hidden potential of Diffusion Transformers (DiTs) to significantly enhance generative tasks. Through an in-depth analysis of the denoising process, we demonstrate that introducing a single learned scaling parameter can significantly improve the performance of DiT blocks. Building on this insight, we propose Calibri, a parameter-efficient approach that optimally calibrates DiT components to elevate generative quality. Calibri frames DiT calibration as a black-box reward optimization problem, which is efficiently solved using an evolutionary algorithm and modifies just ~100 parameters. Experimental results reveal that despite its lightweight design, Calibri consistently improves performance across various text-to-image models. Notably, Calibri also reduces the inference steps required for image generation, all while maintaining high-quality outputs.