GIFT: Generalizing Intent for Flexible Test-Time Rewards

arXiv cs.RO / 3/25/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces GIFT (Generalizing Intent for Flexible Test-Time Rewards), aiming to make robot reward functions learned from demonstrations generalize to new environments by focusing on underlying human intent rather than spurious correlations in training data.
GIFT uses language models to infer high-level intent from demonstrations by contrasting preferred versus non-preferred behaviors, then applies intent-conditioned similarity at test time to map novel states to behaviorally equivalent training states without retraining.
In simulated tabletop manipulation experiments with over 50 unseen objects across four tasks, GIFT outperforms visual and semantic-similarity baselines on both pairwise win rate and state-alignment F1.
Real-world tests on a 7-DoF Franka Panda robot show that the approach transfers reliably to physical settings, suggesting robustness beyond simulation.

Abstract

Robots learn reward functions from user demonstrations, but these rewards often fail to generalize to new environments. This failure occurs because learned rewards latch onto spurious correlations in training data rather than the underlying human intent that demonstrations represent. Existing methods leverage visual or semantic similarity to improve robustness, yet these surface-level cues often diverge from what humans actually care about. We present Generalizing Intent for Flexible Test-Time Rewards (GIFT), a framework that grounds reward generalization in human intent rather than surface cues. GIFT leverages language models to infer high-level intent from user demonstrations by contrasting preferred with non-preferred behaviors. At deployment, GIFT maps novel test states to behaviorally equivalent training states via intent-conditioned similarity, enabling learned rewards to generalize across distribution shifts without retraining. We evaluate GIFT on tabletop manipulation tasks with new objects and layouts. Across four simulated tasks with over 50 unseen objects, GIFT consistently outperforms visual and semantic similarity baselines in test-time pairwise win rate and state-alignment F1 score. Real-world experiments on a 7-DoF Franka Panda robot demonstrate that GIFT reliably transfers to physical settings. Further discussion can be found at https://mit-clear-lab.github.io/GIFT/