E$^2$DT: Efficient and Effective Decision Transformer with Experience-Aware Sampling for Robotic Manipulation

arXiv cs.RO / 5/4/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that Decision Transformer (DT) performance in robotic manipulation depends strongly on how well collected experiences cover the relevant trajectory space, and that uniform replay limits efficiency and exploration.
It proposes E$^2$DT, a DT-guided k-Determinantal Point Process (k-DPP) sampling method that actively selects training experiences using both quality and diversity criteria.
E$^2$DT is experience-aware: it prioritizes high-return, high-uncertainty, and underrepresented trajectories while using latent embedding diversity to maintain diversity across trajectory windows.
The method quantifies experience quality via a composite kernel that integrates return-to-go (RTG) quantiles, predictive uncertainty, and stage coverage, then combines these signals into a joint quality-diversity kernel.
Experiments on robotic manipulation benchmarks in simulation and real robots show E$^2$DT consistently outperforms prior approaches, suggesting a principled way to improve long-horizon robotic learning.

Abstract

In reinforcement learning (RL) for robotic manipulation, the Decision Transformer (DT) has emerged as an effective framework for addressing long-horizon tasks. However, DT's performance depends heavily on the coverage of collected experiences. Without an active exploration mechanism, standard DT relies on uniform replay, which leads to poor sample efficiency, limited exploration, and reduced overall effectiveness. At the same time, while excessive exploration can help avoid local optima, it often delays policy convergence and leads to degraded efficiency. To address these limitations, we propose E

^2

DT, a DT-guided k-Determinantal Point Process sampling framework that enables the model to actively shape its own experience selection. Our framework is experience-aware, allowing E

^2

DT to be both efficient, by prioritizing sampling quality, such as high-return, high-uncertainty, and underrepresented trajectories, and effective, by ensuring diversity across trajectory windows to preserve policy optimality. Specifically, DT's internal latent embeddings measure diversity across trajectory windows, while quality is quantified through a composite metric that integrates return-to-go (RTG) quantiles, predictive uncertainty, and stage coverage based on inverse frequency. These two dimensions are integrated into a novel quality-diversity joint kernel that prioritizes the most informative experiences, thereby enabling learning that is both efficient and effective. We evaluate E

^2

DT on challenging robotic manipulation benchmarks in both simulation and real-robot settings. Results show that it consistently outperforms prior methods. These findings demonstrate that coupling policy learning with experience-aware sampling provides a principled path toward robust long-horizon robotic learning.

AnnouncementsBuilding a new enterprise AI services company with Blackstone, Hellman & Friedman, and Goldman Sachs

Anthropic News

Dara Khosrowshahi on replacing Uber drivers — and himself — with AI

The Verge

CLMA Frame Test

Dev.to

You Are Right — You Don't Need CLAUDE.md

Dev.to

Governance and Liability in AI Agents: What I Built Trying to Answer Those Questions

Dev.to

E$^2$DT: Efficient and Effective Decision Transformer with Experience-Aware Sampling for Robotic Manipulation

Key Points

Abstract

Related Articles

AnnouncementsBuilding a new enterprise AI services company with Blackstone, Hellman & Friedman, and Goldman Sachs

Dara Khosrowshahi on replacing Uber drivers — and himself — with AI

CLMA Frame Test

You Are Right — You Don't Need CLAUDE.md

Governance and Liability in AI Agents: What I Built Trying to Answer Those Questions

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer