Learning What Matters Now: Dynamic Preference Inference under Contextual Shifts

arXiv cs.AI / 3/25/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses sequential decision-making where an agent’s preference weights are unobserved latent variables that drift with context rather than remaining fixed.
  • It introduces Dynamic Preference Inference (DPI), where the agent maintains a probabilistic belief over latent preferences, updates it from recent interactions, and conditions its policy on the inferred weights.
  • DPI is implemented as a variational preference inference module trained jointly with a preference-conditioned actor-critic, using vector-valued returns as evidence for latent trade-offs.
  • Across queueing, maze, and multi-objective continuous-control environments with event-driven objective shifts, DPI adapts its inferred preferences to new regimes and improves post-shift performance over fixed-weight and heuristic baselines.

Abstract

Humans often juggle multiple, sometimes conflicting objectives and shift their priorities as circumstances change, rather than following a fixed objective function. In contrast, most computational decision-making and multi-objective RL methods assume static preference weights or a known scalar reward. In this work, we study sequential decision-making problem when these preference weights are unobserved latent variables that drift with context. Specifically, we propose Dynamic Preference Inference (DPI), a cognitively inspired framework in which an agent maintains a probabilistic belief over preference weights, updates this belief from recent interaction, and conditions its policy on inferred preferences. We instantiate DPI as a variational preference inference module trained jointly with a preference-conditioned actor-critic, using vector-valued returns as evidence about latent trade-offs. In queueing, maze, and multi-objective continuous-control environments with event-driven changes in objectives, DPI adapts its inferred preferences to new regimes and achieves higher post-shift performance than fixed-weight and heuristic envelope baselines.

Learning What Matters Now: Dynamic Preference Inference under Contextual Shifts | AI Navigate