Sample-Efficient Optimization over Generative Priors via Coarse Learnability

arXiv stat.ML / 5/6/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies zeroth-order optimization that must both minimize a cost function while keeping high probability under a complex generative prior, which is reformulated as sampling from a distribution proportional to the prior times an exponential cost penalty.
  • It argues that classical model-based optimization lacks finite-sample guarantees for expressive learned models and introduces a weaker statistical condition called “coarse learnability,” requiring only polynomial-factor coverage of the target probability mass.
  • Based on this assumption, the authors propose an iterative model-based optimization algorithm (Alift) with a sample-correction step, showing it can approximate the target distribution using only a polynomial number of samples.
  • The framework yields global ε-optimality for certain non-convex objectives with a quadratic envelope, with sample complexity roughly irst polynomial in logarithmic factors of 1/ε, and links the assumption to “optimistic” posterior distributions.
  • As supporting theory, the paper shows coarse learnability can hold naturally for simpler cases (including parametric maximum likelihood and over-smoothed kernel density estimation) and provides qualitative motivation for inference-time alignment using primitive LLMs with zeroth-order feedback.

Abstract

We study zeroth-order optimization where solutions must minimize a cost d(s) while maintaining high probability under a complex generative prior L(s) (e.g., a parameterized model). This reduces to sampling from a target distribution proportional to L(s) e^{-T \cdot d(s)}. Since classical model-based optimization (MBO) lacks finite-sample guarantees for expressive approximate learners, we introduce "coarse learnability", a flexible statistical assumption requiring only that a learned model covers the target's probability mass within a polynomial factor. Leveraging this assumption, we design an iterative MBO algorithm called \alift with a sample correction step that provably approximates the target using only a polynomial number of samples. We apply this framework to globally optimizing non-convex objectives bounded by a quadratic envelope in R^n, where we show this assumption is naturally satisfied for a family of "optimistic" posterior distributions. To reach global \varepsilon-optimality, this implies a sample complexity of \widetilde{O}(\log 1/\varepsilon), a rate characteristic of optimistic space-partitioning methods. We further justify coarse learnability as an assumption for generative priors theoretically, proving that in simple settings, parametric maximum likelihood estimation and over-smoothed kernel density estimators naturally satisfy it. Finally, one motivation for our framework comes from inference-time alignment. Though our primary contribution pertains to the theoretical foundations of MBO, we provide qualitative evidence that, in simple settings, even primitive LLMs can shift their distributions toward lower-cost regions when fine-tuned with zeroth-order feedback.