Abstract
Data-driven inverse optimization estimates unknown parameters of an optimization model from noisy and possibly suboptimal decision observations, with applications spanning logistics, portfolio choice, assortment, and energy markets. Existing approaches based on KKT residuals, variational inequalities, or squared distance to observed decisions are non-convex or non-differentiable, scale poorly with sample size, and frequently degenerate under noise. We propose a \emph{Fenchel-Young (FY) loss} approach for inverse optimization with linear forward objectives over convex feasible sets, observing that the suboptimality loss of a regularized forward problem coincides with the Fenchel-Young loss from structured prediction. The resulting surrogate is convex, has a closed-form gradient, is robust to infeasible observations, and trains by stochastic gradient descent at per-iteration cost independent of n. On the theoretical side, we show that the FY estimator's excess risk controls the deployment-relevant \emph{regret}, with geometry-aware upper bounds: parametric \tilde O(p/n) for strongly convex feasible sets with no regularization required, and \tilde O((p/n)^{(\beta+1)/(2\beta)}) at \lambda^\star \asymp n^{-1/(2\beta)} for polyhedral feasible sets under a margin condition with exponent \beta \geq 1. Empirically, the FY estimator outperforms KKT-residual, variational-inequality, semiparametric, and maximum-optimality-margin baselines on both synthetic benchmarks (covering polyhedral, smooth, and strongly convex feasible sets) and the Uber Movement Los Angeles shortest-path dataset, achieving sub-percent regret on the latter while running 10 to 100 times faster.