Efficient Controller Learning from Human Preferences and Numerical Data Via Multi-Modal Surrogate Models

arXiv cs.LG / 3/26/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses the challenge of tuning control policies to satisfy high-level objectives, where objective functions may be subjective and driven by human preferences rather than quantitative metrics.
  • It proposes a multi-fidelity, multi-modal Bayesian optimization method that combines low-fidelity numerical evaluations with high-fidelity pairwise human preference comparisons in a unified framework.
  • The approach uses Gaussian-process surrogate models with two alternative structures—hierarchical autoregressive and non-hierarchical coregionalization—to efficiently learn from mixed data modalities.
  • In a demonstrated case study tuning an autonomous vehicle trajectory planner, the authors show that combining numerical and preference data reduces the number of experiments requiring human decision-maker involvement while adapting driving style to individual preferences.

Abstract

Tuning control policies manually to meet high-level objectives is often time-consuming. Bayesian optimization provides a data-efficient framework for automating this process using numerical evaluations of an objective function. However, many systems, particularly those involving humans, require optimization based on subjective criteria. Preferential Bayesian optimization addresses this by learning from pairwise comparisons instead of quantitative measurements, but relying solely on preference data can be inefficient. We propose a multi-fidelity, multi-modal Bayesian optimization framework that integrates low-fidelity numerical data with high-fidelity human preferences. Our approach employs Gaussian process surrogate models with both hierarchical, autoregressive and non-hierarchical, coregionalization-based structures, enabling efficient learning from mixed-modality data. We illustrate the framework by tuning an autonomous vehicle's trajectory planner, showing that combining numerical and preference data significantly reduces the need for experiments involving the human decision maker while effectively adapting driving style to individual preferences.