Efficient Controller Learning from Human Preferences and Numerical Data Via Multi-Modal Surrogate Models
arXiv cs.LG / 2026/3/26
💬 オピニオンIdeas & Deep AnalysisModels & Research
要点
- The paper addresses the challenge of tuning control policies to satisfy high-level objectives, where objective functions may be subjective and driven by human preferences rather than quantitative metrics.
- It proposes a multi-fidelity, multi-modal Bayesian optimization method that combines low-fidelity numerical evaluations with high-fidelity pairwise human preference comparisons in a unified framework.
- The approach uses Gaussian-process surrogate models with two alternative structures—hierarchical autoregressive and non-hierarchical coregionalization—to efficiently learn from mixed data modalities.
- In a demonstrated case study tuning an autonomous vehicle trajectory planner, the authors show that combining numerical and preference data reduces the number of experiments requiring human decision-maker involvement while adapting driving style to individual preferences.



