Reducing Oracle Feedback with Vision-Language Embeddings for Preference-Based RL
arXiv cs.LG / 3/31/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- Preference-based reinforcement learning is often limited by the high cost of oracle feedback needed to learn reward functions from comparisons.
- The paper proposes ROVED, a hybrid approach that uses lightweight vision-language embeddings to create segment-level preferences while routing only high-uncertainty samples to an oracle for targeted supervision.
- ROVED adds a parameter-efficient fine-tuning strategy so the VLE is progressively adapted using the oracle feedback, improving performance over time without losing scalability.
- Experiments on multiple robotic manipulation tasks show ROVED matches or exceeds prior methods while cutting oracle queries by up to 80% and delivering cumulative annotation savings of up to 90% via cross-task generalization of the adapted VLE.
Related Articles

Black Hat Asia
AI Business
[D] How does distributed proof of work computing handle the coordination needs of neural network training?
Reddit r/MachineLearning

Claude Code's Entire Source Code Was Just Leaked via npm Source Maps — Here's What's Inside
Dev.to

BYOK is not just a pricing model: why it changes AI product trust
Dev.to

AI Citation Registries and Identity Persistence Across Records
Dev.to