Learning to Shuffle: Block Reshuffling and Reversal Schemes for Stochastic Optimization

arXiv cs.LG / 4/2/2026

📰 News

Key Points

  • The paper studies shuffling strategies for stochastic gradient descent (SGD), proving convergence for arbitrary within-epoch permutations and emphasizing that random reshuffling improves optimization constants over simpler schemes like cyclic or shuffle-once.

Abstract

Shuffling strategies for stochastic gradient descent (SGD), including incremental gradient, shuffle-once, and random reshuffling, are supported by rigorous convergence analyses for arbitrary within-epoch permutations. In particular, random reshuffling is known to improve optimization constants relative to cyclic and shuffle-once schemes. However, existing theory offers limited guidance on how to design new data-ordering schemes that further improve optimization constants or stability beyond random reshuffling. In this paper, we design a pipeline using a large language model (LLM)-guided program evolution framework to discover an effective shuffling rule for without-replacement SGD. Abstracting from this instance, we identify two fundamental structural components: block reshuffling and paired reversal. We analyze these components separately and show that block reshuffling strictly reduces prefix-gradient variance constants within the unified shuffling framework, yielding provable improvements over random reshuffling under mild conditions. Separately, we show that paired reversal symmetrizes the epoch map and cancels the leading order-dependent second-order term, reducing order sensitivity from quadratic to cubic in the step size. Numerical experiments with the discovered algorithm validate the theory and demonstrate consistent gains over standard shuffling schemes across convex and nonconvex benchmarks.

Learning to Shuffle: Block Reshuffling and Reversal Schemes for Stochastic Optimization | AI Navigate