AlphaInventory: Evolving White-Box Inventory Policies via Large Language Models with Deployment Guarantees

arXiv cs.AI / 5/4/2026

💬 OpinionDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper proposes AlphaInventory, an end-to-end framework that uses large language models to evolve inventory policies in online, non-stationary demand settings.
  • AlphaInventory is built around reinforcement learning and confidence-interval-based certification, generating white-box inventory policies with statistical safety guarantees for future deployment periods.
  • It trains the model using not only demand data but also additional numerical and textual features, aiming for stronger policy evolution than prior approaches.
  • The authors provide a unified theoretical interface linking training, inference, and deployment, enabling them to bound the probability of evolving a statistically safe and improved policy.
  • Experiments on both synthetic and real retail datasets show AlphaInventory outperforms classical inventory policies and deep learning baselines, improving upon existing benchmarks in standard inventory scenarios.

Abstract

We study how large language models can be used to evolve inventory policies in online, non-stationary environments. Our work is motivated by recent advances in LLM-based evolutionary search, such as AlphaEvolve, which demonstrates strong performance for static and highly structured problems such as mathematical discovery, but is not directly suited to online dynamic inventory settings. To this end, we propose AlphaInventory, an end-to-end inventory-policy evolution and inference framework grounded in confidence-interval-based certification. The framework trains a large language model using reinforcement learning, incorporates demand data as well as numerical and textual features beyond demand, and generates white-box inventory policy with statistical safety guarantees for deployment in future periods. We further introduce a unified theoretical interface that connects training, inference, and deployment. This allows us to characterize the probability that the AlphaInventory evolves a statistically safe and improved policy, and to quantify the deployment gap relative to the oracle-safe benchmark. Tested on both synthetic data and real-world retail data, AlphaInventory outperforms classical inventory policies and deep learning based methods. In canonical inventory settings, it evolves new policies that improve upon existing benchmarks.