AgentHER: Hindsight Experience Replay for LLM Agent Trajectory Relabeling
arXiv cs.AI / 3/24/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces AgentHER, which adapts Hindsight Experience Replay (HER) to natural-language LLM agent trajectories by relabeling failed runs as successful demonstrations for alternative achievable goals.
- AgentHER uses a four-stage pipeline—failure classification, outcome extraction, LLM-guided prompt relabeling with confidence gating, and data packaging—producing offline training data for SFT, DPO, and ShareGPT.
- Experiments on WebArena and ToolBench show AgentHER improves over success-only training by +7.1 to +11.7 percentage points across multiple model families, while achieving about 2x data efficiency (matching performance with roughly half the successful demonstrations).
- The method scales consistently across model sizes (about 1.5B to 72B parameters) and further improves under iterative redeployment, indicating it can compound gains across training rounds.
- Human evaluation reports high relabeling precision (97.7%) using multi-judge verification, supporting the quality of recovered training signal from discarded failures.
Related Articles

Composer 2: What is new and Compares with Claude Opus 4.6 & GPT-5.4
Dev.to
How UCP Breaks Your E-Commerce Tracking Stack: A Platform-by-Platform Analysis
Dev.to
AI Text Analyzer vs Asking Friends: Which Gives Better Perspective?
Dev.to
[D] Cathie wood claims ai productivity wave is starting, data shows 43% of ceos save 8+ hours weekly
Reddit r/MachineLearning

Microsoft hires top AI researchers from Allen Institute for AI for Suleyman's Superintelligence team
THE DECODER