OptProver: Bridging Olympiad and Optimization through Continual Training in Formal Theorem Proving

arXiv cs.LG / 4/28/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper introduces OptProver, a formally trained model aimed at bridging the gap between Olympiad-level theorem proving and undergraduate optimization topics, which have been less explored by current systems.
  • It argues that naive transfer fails due to major distribution shift caused by optimization’s reliance on specialized formalisms such as convexity, optimality conditions, and algorithmic analysis.
  • OptProver mitigates this shift by using large-scale optimization-focused data curation via expert iteration and a specialized preference-learning objective that combines perplexity-weighted optimization with penalties for valid but non-progressing proof steps.
  • The authors build a new Lean 4 benchmark for optimization and report state-of-the-art Pass@1 and Pass@32 results among similarly sized models, while preserving strong performance on general theorem-proving tasks (avoiding catastrophic forgetting).

Abstract

Recent advances in formal theorem proving have focused on Olympiad-level mathematics, leaving undergraduate domains largely unexplored. Optimization, fundamental to machine learning, operations research, and scientific computing, remains underserved by existing provers. Its reliance on domain-specific formalisms (convexity, optimality conditions, and algorithmic analysis) creates significant distribution shift, making naive domain transfer ineffective. We present OptProver, a trained model that achieves robust transfer from Olympiad to undergraduate optimization. Starting from a strong Olympiad-level prover, our pipeline mitigates distribution shift through two key innovations. First, we employ large-scale optimization-focused data curation via expert iteration. Second, we introduce a specialized preference learning objective that integrates perplexity-weighted optimization with a mechanism to penalize valid but non-progressing proof steps. This not only addresses distribution shifts but also guides the search toward efficient trajectories. To enable rigorous evaluation, we construct a novel benchmark in Lean 4 focused on optimization. On this benchmark, OptProver achieves state-of-the-art Pass@1 and Pass@32 among comparably sized models while maintaining competitive performance on general theorem-proving tasks, demonstrating effective domain transfer without catastrophic forgetting.