Efficient Controller Learning from Human Preferences and Numerical Data Via Multi-Modal Surrogate Models

arXiv cs.LG / 3/26/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper addresses the challenge of tuning control policies to satisfy high-level objectives, where objective functions may be subjective and driven by human preferences rather than quantitative metrics.
It proposes a multi-fidelity, multi-modal Bayesian optimization method that combines low-fidelity numerical evaluations with high-fidelity pairwise human preference comparisons in a unified framework.
The approach uses Gaussian-process surrogate models with two alternative structures—hierarchical autoregressive and non-hierarchical coregionalization—to efficiently learn from mixed data modalities.
In a demonstrated case study tuning an autonomous vehicle trajectory planner, the authors show that combining numerical and preference data reduces the number of experiments requiring human decision-maker involvement while adapting driving style to individual preferences.

Abstract

Tuning control policies manually to meet high-level objectives is often time-consuming. Bayesian optimization provides a data-efficient framework for automating this process using numerical evaluations of an objective function. However, many systems, particularly those involving humans, require optimization based on subjective criteria. Preferential Bayesian optimization addresses this by learning from pairwise comparisons instead of quantitative measurements, but relying solely on preference data can be inefficient. We propose a multi-fidelity, multi-modal Bayesian optimization framework that integrates low-fidelity numerical data with high-fidelity human preferences. Our approach employs Gaussian process surrogate models with both hierarchical, autoregressive and non-hierarchical, coregionalization-based structures, enabling efficient learning from mixed-modality data. We illustrate the framework by tuning an autonomous vehicle's trajectory planner, showing that combining numerical and preference data significantly reduces the need for experiments involving the human decision maker while effectively adapting driving style to individual preferences.

AgentDesk vs Hiring Another Consultant: A Cost Comparison

Dev.to

"Why Your AI Agent Needs a System 1"

Dev.to

When should we expect TurboQuant?

Reddit r/LocalLLaMA

AI as Your Customs Co-Pilot: Automating HS Code Chaos in Southeast Asia

Dev.to

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

Dev.to

Efficient Controller Learning from Human Preferences and Numerical Data Via Multi-Modal Surrogate Models

Key Points

Abstract

Related Articles

AgentDesk vs Hiring Another Consultant: A Cost Comparison

"Why Your AI Agent Needs a System 1"

When should we expect TurboQuant?

AI as Your Customs Co-Pilot: Automating HS Code Chaos in Southeast Asia

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer