AutoOR: Scalably Post-training LLMs to Autoformalize Operations Research Problems

arXiv cs.LG / 4/21/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

AutoOR is presented as a scalable pipeline that post-trains LLMs using synthetic, verified data to automatically convert natural-language operations research (OR) problems into solver-ready formulations.
The method combines synthetic data generation from standard optimization forms with reinforcement learning where solver execution feedback serves as the reward signal.
Experiments show that an 8B model trained with AutoOR achieves state-of-the-art or competitive performance on six established OR benchmarks, performing comparably to much larger frontier models.
For difficult non-linear OR problems involving physical dynamics (where prior frontier models reportedly score near 0%), AutoOR introduces a curriculum RL strategy to bootstrap from limited initial data and make the class learnable.
The authors argue that AutoOR-style approaches could meaningfully speed up industrial decision-making by reducing the OR expertise required to formalize optimization tasks.

Abstract

Optimization problems are central to decision-making in manufacturing, logistics, scheduling, and other industrial settings. Translating complicated descriptions of these problems into solver-ready formulations requires specialized operations research (OR) expertise, making it hard to scale. We present AutoOR, a scalable synthetic data generation and reinforcement learning pipeline that trains LLMs to autoformalize optimization problems specified in natural language across linear, mixed-integer, and non-linear categories. AutoOR generates verified training data from standard optimization forms and uses solver execution feedback as the reward signal for RL post-training. AutoOR applied to an 8B model achieves state-of-the-art or competitive results across six established OR benchmarks, matching significantly larger frontier models. For a non-linear problem class involving physical dynamics, where frontier models score near 0%, we introduce a curriculum RL strategy that bootstraps from limited initial training data to make this class tractable for post-training. We believe that methods such as AutoOR can significantly accelerate industrial decision-making with AI.