Can Tabular Foundation Models Guide Exploration in Robot Policy Learning?

arXiv cs.RO / 5/1/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper addresses sample-efficient policy optimization in robotics continuous control, where existing methods are often either local (sensitive to initialization and tuning) or global (expensive in rollouts).
It introduces TFM-S3, a tabular hybrid local–global approach that alternates frequent local updates with periodic global search to improve exploration without significantly increasing rollout cost.
During each global search round, TFM-S3 builds a dynamically updated low-dimensional policy subspace using SVD and refines policies via iterative surrogate-guided optimization within that subspace.
The method leverages a pretrained tabular foundation model that predicts candidate returns from a small context set, allowing large-scale screening while using limited real rollouts.
Experiments on continuous control benchmarks show that TFM-S3 accelerates early convergence and improves final performance over TD3 and population-based baselines under the same rollout budget, supporting the value of foundation models for robotics policy learning.

Abstract

Policy optimization in high-dimensional continuous control for robotics remains a challenging problem. Predominant methods are inherently local and often require extensive tuning and carefully chosen initial guesses for good performance, whereas more global and less initialization-sensitive search methods typically incur high rollout costs. We propose TFM-S3, a tabular hybrid local-global method for improving global exploration in robot policy learning with limited rollout cost. We interleave high-frequency local updates with intermittent rounds of global search. In each search round, we construct a dynamically updated low-dimensional policy subspace via SVD and perform iterative surrogate-guided refinement within this space. A pretrained tabular foundation model predicts candidate returns from a small context set, enabling large-scale screening with limited rollout cost. Experiments on continuous control benchmarks show that TFM-S3 consistently accelerates early-stage convergence and improves final performance compared to TD3 and population-based baselines under an identical rollout budget. These results demonstrate that foundation models are a powerful new tool for creating sample-efficient policy learning methods for continuous control in robotics.