Abstract
We study zeroth-order optimization where solutions must minimize a cost d(s) while maintaining high probability under a complex generative prior L(s) (e.g., a parameterized model). This reduces to sampling from a target distribution proportional to L(s) e^{-T \cdot d(s)}. Since classical model-based optimization (MBO) lacks finite-sample guarantees for expressive approximate learners, we introduce "coarse learnability", a flexible statistical assumption requiring only that a learned model covers the target's probability mass within a polynomial factor. Leveraging this assumption, we design an iterative MBO algorithm called \alift with a sample correction step that provably approximates the target using only a polynomial number of samples. We apply this framework to globally optimizing non-convex objectives bounded by a quadratic envelope in R^n, where we show this assumption is naturally satisfied for a family of "optimistic" posterior distributions. To reach global \varepsilon-optimality, this implies a sample complexity of \widetilde{O}(\log 1/\varepsilon), a rate characteristic of optimistic space-partitioning methods. We further justify coarse learnability as an assumption for generative priors theoretically, proving that in simple settings, parametric maximum likelihood estimation and over-smoothed kernel density estimators naturally satisfy it. Finally, one motivation for our framework comes from inference-time alignment. Though our primary contribution pertains to the theoretical foundations of MBO, we provide qualitative evidence that, in simple settings, even primitive LLMs can shift their distributions toward lower-cost regions when fine-tuned with zeroth-order feedback.