OptProver: Bridging Olympiad and Optimization through Continual Training in Formal Theorem Proving

arXiv cs.LG / 4/28/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

The paper introduces OptProver, a formally trained model aimed at bridging the gap between Olympiad-level theorem proving and undergraduate optimization topics, which have been less explored by current systems.
It argues that naive transfer fails due to major distribution shift caused by optimization’s reliance on specialized formalisms such as convexity, optimality conditions, and algorithmic analysis.
OptProver mitigates this shift by using large-scale optimization-focused data curation via expert iteration and a specialized preference-learning objective that combines perplexity-weighted optimization with penalties for valid but non-progressing proof steps.
The authors build a new Lean 4 benchmark for optimization and report state-of-the-art Pass@1 and Pass@32 results among similarly sized models, while preserving strong performance on general theorem-proving tasks (avoiding catastrophic forgetting).

Abstract

Recent advances in formal theorem proving have focused on Olympiad-level mathematics, leaving undergraduate domains largely unexplored. Optimization, fundamental to machine learning, operations research, and scientific computing, remains underserved by existing provers. Its reliance on domain-specific formalisms (convexity, optimality conditions, and algorithmic analysis) creates significant distribution shift, making naive domain transfer ineffective. We present OptProver, a trained model that achieves robust transfer from Olympiad to undergraduate optimization. Starting from a strong Olympiad-level prover, our pipeline mitigates distribution shift through two key innovations. First, we employ large-scale optimization-focused data curation via expert iteration. Second, we introduce a specialized preference learning objective that integrates perplexity-weighted optimization with a mechanism to penalize valid but non-progressing proof steps. This not only addresses distribution shifts but also guides the search toward efficient trajectories. To enable rigorous evaluation, we construct a novel benchmark in Lean 4 focused on optimization. On this benchmark, OptProver achieves state-of-the-art Pass@1 and Pass@32 among comparably sized models while maintaining competitive performance on general theorem-proving tasks, demonstrating effective domain transfer without catastrophic forgetting.