AI Navigate

Contextual Preference Distribution Learning

arXiv cs.LG / 3/19/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • We introduce a sequential learning-and-optimization pipeline to learn context-dependent preference distributions for decision-making problems with uncertainty, focusing on (integer) linear programs.
  • The method uses a bounded-variance score function gradient estimator to train a predictive model that maps contextual features to parameterizable distributions, yielding a maximum likelihood estimate.
  • The model generates scenarios for unseen contexts to be used in downstream optimization, enabling risk-averse decision-making beyond point estimates.
  • In a synthetic ridesharing environment, the approach reduces average post-decision surprise by up to 114x compared to a risk-neutral baseline with perfect predictions and up to 25x versus leading risk-averse baselines.

Abstract

Decision-making problems often feature uncertainty stemming from heterogeneous and context-dependent human preferences. To address this, we propose a sequential learning-and-optimization pipeline to learn preference distributions and leverage them to solve downstream problems, for example risk-averse formulations. We focus on human choice settings that can be formulated as (integer) linear programs. In such settings, existing inverse optimization and choice modelling methods infer preferences from observed choices but typically produce point estimates or fail to capture contextual shifts, making them unsuitable for risk-averse decision-making. Using a bounded-variance score function gradient estimator, we train a predictive model mapping contextual features to a rich class of parameterizable distributions. This approach yields a maximum likelihood estimate. The model generates scenarios for unseen contexts in the subsequent optimization phase. In a synthetic ridesharing environment, our approach reduces average post-decision surprise by up to 114\times compared to a risk-neutral approach with perfect predictions and up to 25\times compared to leading risk-averse baselines.