Online Experiential Learning for Language Models

arXiv cs.CL / 3/18/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

OEL proposes a two-stage online experiential learning framework that allows language models to continuously improve from their deployment experiences without accessing user-side environments.
The first stage extracts transferable experiential knowledge from user interaction trajectories, and the second stage consolidates it into model parameters via on-policy context distillation, forming an iterative online learning loop.
Evaluations on text-based game environments across multiple model scales show consistent improvements in task accuracy and token efficiency while preserving out-of-distribution performance.
The results indicate experiential knowledge is more effective than raw trajectories, and on-policy consistency between the knowledge source and the policy model is critical for effective learning.

Abstract

The prevailing paradigm for improving large language models relies on offline training with human annotations or simulated environments, leaving the rich experience accumulated during real-world deployment entirely unexploited. We propose Online Experiential Learning (OEL), a framework that enables language models to continuously improve from their own deployment experience. OEL operates in two stages: first, transferable experiential knowledge is extracted and accumulated from interaction trajectories collected on the user side; second, this knowledge is consolidated into model parameters via on-policy context distillation, requiring no access to the user-side environment. The two stages are iterated to form an online learning loop, where the improved model collects higher-quality trajectories that yield richer experiential knowledge for subsequent rounds. We evaluate OEL on text-based game environments across multiple model scales and both thinking and non-thinking variants. OEL achieves consistent improvements over successive iterations, enhancing both task accuracy and token efficiency while preserving out-of-distribution performance. Our analysis further shows that extracted experiential knowledge is significantly more effective than raw trajectories, and that on-policy consistency between the knowledge source and the policy model is critical for effective learning.

AI's Economic Impact Falls Short: Addressing the Gap Between Investment and Measurable Growth

Dev.to

The Inception Loop: A Month in the Life of a Self-Improving AI Sidekick

Dev.to

The Editing Tax: Why AI 'Saves Time' Until It Doesn't — And How to Reduce Rework

Dev.to

AI Can Write Your Code. Who's Testing Your Thinking?

Dev.to

[R] Weekly digest: arXiv AI security papers translated for practitioners -- Cascade (cross-stack CVE+Rowhammer attacks on compound AI), LAMLAD (dual-LLM adversarial ML, 97% evasion), OpenClaw (4 vuln classes in agent frameworks)

Reddit r/MachineLearning

Online Experiential Learning for Language Models

Key Points

Abstract

Related Articles

AI's Economic Impact Falls Short: Addressing the Gap Between Investment and Measurable Growth

The Inception Loop: A Month in the Life of a Self-Improving AI Sidekick

The Editing Tax: Why AI 'Saves Time' Until It Doesn't — And How to Reduce Rework

AI Can Write Your Code. Who's Testing Your Thinking?

[R] Weekly digest: arXiv AI security papers translated for practitioners -- Cascade (cross-stack CVE+Rowhammer attacks on compound AI), LAMLAD (dual-LLM adversarial ML, 97% evasion), OpenClaw (4 vuln classes in agent frameworks)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer