SLEA-RL: Step-Level Experience Augmented Reinforcement Learning for Multi-Turn Agentic Training

arXiv cs.LG / 3/20/2026

📰 NewsModels & Research

共有:

Key Points

SLEA-RL introduces step-level experience augmentation for multi-turn LLM agents by retrieving experiences at each decision step conditioned on the current observation.
It unpacks three components: step-level observation clustering for efficient, structure-preserving retrieval; a self-evolving experience library that uses score-based admission and rate-limited extraction to distill strategies and failure patterns; and policy optimization with step-level credit assignment for fine-grained advantage estimation across episodes.
The library evolves alongside the policy via semantic analysis rather than gradient updates, enabling continual adaptation without direct gradient updates to stored experiences.
Experiments on long-horizon multi-turn benchmarks show SLEA-RL achieves superior performance over various RL baselines.

Abstract

Large Language Model (LLM) agents have shown strong results on multi-turn tool-use tasks, yet they operate in isolation during training, failing to leverage experiences accumulated across episodes. Existing experience-augmented methods address this by organizing trajectories into retrievable libraries, but they retrieve experiences only once based on the initial task description and hold them constant throughout the episode. In multi-turn settings where observations change at every step, this static retrieval becomes increasingly mismatched as episodes progress. We propose SLEA-RL (Step-Level Experience-Augmented Reinforcement Learning), a framework that retrieves relevant experiences at each decision step conditioned on the current observation. SLEA-RL operates through three components: (i) step-level observation clustering that groups structurally equivalent environmental states for efficient cluster-indexed retrieval; (ii) a self-evolving experience library that distills successful strategies and failure patterns through score-based admission and rate-limited extraction; and (iii) policy optimization with step-level credit assignment for fine-grained advantage estimation across multi-turn episodes. The experience library evolves alongside the policy through semantic analysis rather than gradient updates. Experiments on long-horizon multi-turn agent benchmarks demonstrate that SLEA-RL achieves superior performance compared to various reinforcement learning baselines.

When AI Grows Up: Identity, Memory, and What Persists Across Versions

Dev.to

OpenAI is throwing everything into building a fully automated researcher

MIT Technology Review

Kimi just published a paper replacing residual connections in transformers. results look legit

Reddit r/LocalLLaMA

機械学習の最適化対象まとめ（E資格対策にも）

Qiita

14 Best Self-Hosted Claude Alternatives for AI and Coding in 2026

Dev.to

SLEA-RL: Step-Level Experience Augmented Reinforcement Learning for Multi-Turn Agentic Training

Key Points

Abstract

Related Articles

When AI Grows Up: Identity, Memory, and What Persists Across Versions

OpenAI is throwing everything into building a fully automated researcher

Kimi just published a paper replacing residual connections in transformers. results look legit

機械学習の最適化対象まとめ（E資格対策にも）

14 Best Self-Hosted Claude Alternatives for AI and Coding in 2026

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer