APEX-EM: Non-Parametric Online Learning for Autonomous Agents via Structured Procedural-Episodic Experience Replay

arXiv cs.AI / 4/1/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces APEX-EM, a non-parametric online learning framework for LLM-based autonomous agents that reuses prior procedural plans via structured procedural-episodic experience replay without updating model weights.
APEX-EM defines a structured experience representation capturing planning steps, artifacts, iteration history with error analysis, and quality scores, and uses a PRGII workflow with task verifiers to generate multi-dimensional reward signals.
It also proposes a dual-outcome experience memory that performs hybrid retrieval using semantic search, structural signature matching, and plan-DAG traversal to enable transfer across tasks with little/no lexical overlap but similar operational structure.
Experiments on BigCodeBench, KGQAGen-10k, and Humanity’s Last Exam show large accuracy/SR gains from memory, including 89.6% vs 41.3% on KGQAGen-10k and 83.3% SR vs 53.9% on BigCodeBench, with ablations indicating feedback usefulness depends on task type.
The approach treats successful executions as positive in-context examples and failures as negative examples annotated with structured error information to improve iterative planning and reuse over time.

Abstract

LLM-based autonomous agents lack persistent procedural memory: they re-derive solutions from scratch even when structurally identical tasks have been solved before. We present \textbf{APEX-EM}, a non-parametric online learning framework that accumulates, retrieves, and reuses structured procedural plans without modifying model weights. APEX-EM introduces: (1) a \emph{structured experience representation} encoding the full procedural-episodic trace of each execution -- planning steps, artifacts, iteration history with error analysis, and quality scores; (2) a \emph{Plan-Retrieve-Generate-Iterate-Ingest} (PRGII) workflow with Task Verifiers providing multi-dimensional reward signals; and (3) a \emph{dual-outcome Experience Memory} with hybrid retrieval combining semantic search, structural signature matching, and plan DAG traversal -- enabling cross-domain transfer between tasks sharing no lexical overlap but analogous operational structure. Successful experiences serve as positive in-context examples; failures as negative examples with structured error annotations. We evaluate on BigCodeBench~\cite{zhuo2025bigcodebench}, KGQAGen-10k~\cite{zhang2025kgqagen}, and Humanity's Last Exam~\cite{phan2025hle} using Claude Sonnet 4.5 and Opus 4.5. On KGQAGen-10k, APEX-EM achieves 89.6\% accuracy versus 41.3\% without memory (+48.3pp), surpassing the oracle-retrieval upper bound (84.9\%). On BigCodeBench, it reaches 83.3\% SR from a 53.9\% baseline (+29.4pp), exceeding MemRL's~\cite{memrl2025} +11.0pp gain under comparable frozen-backbone conditions (noting backbone differences controlled for in our analysis). On HLE, entity graph retrieval reaches 48.0\% from 25.2\% (+22.8pp). Ablations show component value is task-dependent: rich judge feedback is negligible for code generation but critical for structured queries (+10.3pp), while binary-signal iteration partially compensates for weaker feedback.

Black Hat Asia

AI Business

Mistral raises $830M, 9fin hits unicorn status, and new Tech.eu Summit speakers unveiled

Tech.eu

ChatGPT costs $20/month. I built an alternative for $2.99.

Dev.to

OpenAI shifts to usage-based pricing for Codex in ChatGPT business plans

THE DECODER

Why I built an AI assistant that doesn't know who you are

Dev.to

APEX-EM: Non-Parametric Online Learning for Autonomous Agents via Structured Procedural-Episodic Experience Replay

Key Points

Abstract

Related Articles

Black Hat Asia

Mistral raises $830M, 9fin hits unicorn status, and new Tech.eu Summit speakers unveiled

ChatGPT costs $20/month. I built an alternative for $2.99.

OpenAI shifts to usage-based pricing for Codex in ChatGPT business plans

Why I built an AI assistant that doesn't know who you are

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer