Inverse Optimization with Fenchel-Young Losses: Regret Bounds and the Role of Geometry

arXiv stat.ML / 5/5/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces an inverse optimization method based on Fenchel-Young (FY) losses, linking the suboptimality of a regularized forward problem to an FY loss for structured prediction.
  • The proposed surrogate objective is convex, has a closed-form gradient, and is robust to noisy or even infeasible observations, with SGD training whose per-iteration cost does not scale with the sample size n.
  • The authors provide regret-focused theory, showing that the FY estimator’s excess risk bounds deployment-relevant regret, including geometry-aware rates for strongly convex and polyhedral feasible sets.
  • Empirically, the FY approach outperforms several KKT-residual and variational-inequality-style baselines on synthetic tasks and on Uber Movement Los Angeles shortest-path data, achieving sub-percent regret while being 10–100x faster.
  • Overall, the work emphasizes how geometric properties of the feasible set and margin conditions improve learning and optimization guarantees in data-driven inverse optimization.

Abstract

Data-driven inverse optimization estimates unknown parameters of an optimization model from noisy and possibly suboptimal decision observations, with applications spanning logistics, portfolio choice, assortment, and energy markets. Existing approaches based on KKT residuals, variational inequalities, or squared distance to observed decisions are non-convex or non-differentiable, scale poorly with sample size, and frequently degenerate under noise. We propose a \emph{Fenchel-Young (FY) loss} approach for inverse optimization with linear forward objectives over convex feasible sets, observing that the suboptimality loss of a regularized forward problem coincides with the Fenchel-Young loss from structured prediction. The resulting surrogate is convex, has a closed-form gradient, is robust to infeasible observations, and trains by stochastic gradient descent at per-iteration cost independent of n. On the theoretical side, we show that the FY estimator's excess risk controls the deployment-relevant \emph{regret}, with geometry-aware upper bounds: parametric \tilde O(p/n) for strongly convex feasible sets with no regularization required, and \tilde O((p/n)^{(\beta+1)/(2\beta)}) at \lambda^\star \asymp n^{-1/(2\beta)} for polyhedral feasible sets under a margin condition with exponent \beta \geq 1. Empirically, the FY estimator outperforms KKT-residual, variational-inequality, semiparametric, and maximum-optimality-margin baselines on both synthetic benchmarks (covering polyhedral, smooth, and strongly convex feasible sets) and the Uber Movement Los Angeles shortest-path dataset, achieving sub-percent regret on the latter while running 10 to 100 times faster.