Analytical Correction for Subsampling Bias in Drifting Models

arXiv cs.LG / 5/1/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper shows that, in drifting one-step generative models, using minibatch samples to approximate centroids produces a biased estimator due to softmax self-normalization, with a pointwise bias of order O(1/n).
  • Because correcting the bias would require an intractable expectation over the full underlying distributions, the authors introduce Analytical Bias Correction (ABC) as a closed-form plug-in adjustment estimated from in-batch statistics.
  • Theoretical results prove ABC reduces the bias scaling from O(1/n) to O(1/n^2), does not increase total variance at first order, and keeps the corrected centroid within the original convex hull.
  • Experiments (including CIFAR-10) confirm the predicted bias-scaling behavior and show that ABC improves FID and training speed, especially when the minibatch size n is small.

Abstract

Drifting models are capable one-step generative models trained to follow a drifting field. The field combines attractive and repulsive softmax-weighted centroids over the data and current-generator distributions. In practice, only a minibatch of n samples from each distribution is available, and each centroid is approximated by an empirical estimate. In this paper, we begin by showing that the minibatch centroid is in general a biased estimator of the target centroid, with a pointwise O(1/n) bias arising from softmax self-normalization. Correcting this bias requires the expectation over the full distribution, which is intractable. We instead approximate the leading bias term from in-batch statistics and propose Analytical Bias Correction (ABC), a closed-form plug-in adjustment. We prove that ABC reduces the bias from O(1/n) to O(1/n^2), introduces no first-order increase in total variance, and preserves convex-hull containment of the corrected centroid. In practice, ABC requires only two additional lines of code and has negligible wall-time overhead under compiled execution. Toy experiments confirm the theoretical O(1/n) and O(1/n^2) scaling. On CIFAR-10, ABC reduces FID and trains faster, with the largest gains at small n, where the bias is most significant.