Statistical Guarantees for Distributionally Robust Optimization with Optimal Transport and OT-Regularized Divergences

arXiv stat.ML / 3/31/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper provides finite-sample statistical guarantees for distributionally robust optimization (DRO) when the uncertainty neighborhoods are defined using optimal transport (OT) and OT-regularized divergences.
  • It derives concentration inequalities for supervised learning under DRO-based adversarial training, aimed at improving machine-learning adversarial robustness.
  • The results generalize beyond the commonly studied p-Wasserstein setting by covering a broader class of OT cost functions, including soft-constraint norm-ball OT costs.
  • The authors’ theory is claimed to be the first to jointly analyze adversarial sample generation and adversarial reweighting induced by OT-regularized f-divergence neighborhoods.
  • The bounds are reported to show improved dependence on the DRO neighborhood size versus prior adversarial-setting results even in the p-Wasserstein case.

Abstract

We study finite-sample statistical performance guarantees for distributionally robust optimization (DRO) with optimal transport (OT) and OT-regularized divergence model neighborhoods. Specifically, we derive concentration inequalities for supervised learning via DRO-based adversarial training, as commonly employed to enhance the adversarial robustness of machine learning models. Our results apply to a wide range of OT cost functions, beyond the p-Wasserstein case studied by previous authors. In particular, our results are the first to: 1) cover soft-constraint norm-ball OT cost functions; soft-constraint costs have been shown empirically to enhance robustness when used in adversarial training, 2) apply to the combination of adversarial sample generation and adversarial reweighting that is induced by using OT-regularized f-divergence model neighborhoods; the added reweighting mechanism has also been shown empirically to further improve performance. In addition, even in the p-Wasserstein case, our bounds exhibit better behavior as a function of the DRO neighborhood size than previous results when applied to the adversarial setting.