Beyond Fixed Formulas: Data-Driven Linear Predictor for Efficient Diffusion Models

arXiv cs.LG / 4/30/2026

📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • The paper addresses the high sampling cost of Diffusion Transformers (DiTs) and argues that hand-crafted feature-caching formulas break down when aggressive skipping is used.
  • It introduces L2P (Learnable Linear Predictor), a training-free-acceleration caching framework that replaces fixed coefficients with learnable, per-timestep weights to reconstruct current features from past trajectories.
  • L2P can be trained rapidly (about 20 seconds on a single GPU) and is designed to work efficiently for DiT inference.
  • Experiments show substantial performance gains, including 4.55× FLOPs reduction and 4.15× latency speedup on FLUX.1-dev, and strong quality retention up to 7.18× acceleration on Qwen-Image compared with prior baselines.
  • The authors provide code publicly and conclude that learning linear predictors is an effective strategy for efficient diffusion model sampling.

Abstract

To address the high sampling cost of Diffusion Transformers (DiTs), feature caching offers a training-free acceleration method. However, existing methods rely on hand-crafted forecasting formulas that fail under aggressive skipping. We propose L2P (Learnable Linear Predictor), a simple data-driven caching framework that replaces fixed coefficients with learnable per-timestep weights. Rapidly trained in ~20 seconds on a single GPU, L2P accurately reconstructs current features from past trajectories. L2P significantly outperforms existing baselines: it achieves a 4.55x FLOPs reduction and 4.15x latency speedup on FLUX.1-dev, and maintains high visual fidelity under up to 7.18x acceleration on Qwen-Image models, where prior methods show noticeable quality degradation. Our results show learning linear predictors is highly effective for efficient DiT inference. Code is available at https://github.com/Aredstone/L2P-Cache.