Drawing on Memory: Dual-Trace Encoding Improves Cross-Session Recall in LLM Agents

arXiv cs.AI / 4/15/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that persistent-memory LLM agents often store information as flat facts, limiting temporal reasoning, change tracking, and cross-session aggregation.
  • It proposes “dual-trace encoding,” where each stored fact is paired with a concrete scene trace (a narrative reconstruction of when and under what context the information was learned) to make memories more distinctive.
  • Experiments on the LongMemEval-S benchmark (4,575 sessions, 100 recall questions) show dual-trace outperforms a fact-only control, achieving 73.7% vs 53.5% overall accuracy (+20.2 pp, statistically significant).
  • The improvement is concentrated in temporal reasoning (+40 pp), knowledge-update tracking (+25 pp), and multi-session aggregation (+30 pp), with no gain for single-session retrieval, aligning with encoding specificity theory.
  • Token-level analysis indicates the accuracy gains come without additional token cost, and the authors outline an approach to adapt the method to coding agents with preliminary pilot results.

Abstract

LLM agents with persistent memory store information as flat factual records, providing little context for temporal reasoning, change tracking, or cross-session aggregation. Inspired by the drawing effect [3], we introduce dual-trace memory encoding. In this method, each stored fact is paired with a concrete scene trace, a narrative reconstruction of the moment and context in which the information was learned. The agent is forced to commit to specific contextual details during encoding, creating richer, more distinctive memory traces. Using the LongMemEval-S benchmark (4,575 sessions, 100 recall questions), we compare dual-trace encoding against a fact-only control with matched coverage and format over 99 shared questions. Dual-trace achieves 73.7% overall accuracy versus 53.5%, a +20.2 percentage point (pp) gain (95% CI: [+12.1, +29.3], bootstrap p < 0.0001). Gains concentrate in temporal reasoning (+40pp), knowledge-update tracking (+25pp), and multi-session aggregation (+30pp), with no benefit for single-session retrieval, consistent with encoding specificity theory [8]. Token analysis shows dual-trace encoding achieves this gain at no additional cost. We additionally sketch an architectural design for adapting dual-trace encoding to coding agents, with preliminary pilot validation.