Efficient Controller Learning from Human Preferences and Numerical Data Via Multi-Modal Surrogate Models
arXiv cs.LG / 3/26/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper addresses the challenge of tuning control policies to satisfy high-level objectives, where objective functions may be subjective and driven by human preferences rather than quantitative metrics.
- It proposes a multi-fidelity, multi-modal Bayesian optimization method that combines low-fidelity numerical evaluations with high-fidelity pairwise human preference comparisons in a unified framework.
- The approach uses Gaussian-process surrogate models with two alternative structures—hierarchical autoregressive and non-hierarchical coregionalization—to efficiently learn from mixed data modalities.
- In a demonstrated case study tuning an autonomous vehicle trajectory planner, the authors show that combining numerical and preference data reduces the number of experiments requiring human decision-maker involvement while adapting driving style to individual preferences.
Related Articles
AgentDesk vs Hiring Another Consultant: A Cost Comparison
Dev.to
"Why Your AI Agent Needs a System 1"
Dev.to
When should we expect TurboQuant?
Reddit r/LocalLLaMA
AI as Your Customs Co-Pilot: Automating HS Code Chaos in Southeast Asia
Dev.to
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
Dev.to