AI Navigate

D-Mem: A Dual-Process Memory System for LLM Agents

arXiv cs.AI / 3/20/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • D-Mem introduces a dual-process memory system for LLM agents that combines lightweight vector retrieval with a high-fidelity Full Deliberation module to preserve long-horizon context.
  • It uses a Multi-dimensional Quality Gating policy to dynamically switch between fast retrieval and exhaustive deliberation to balance accuracy and computational cost.
  • The approach addresses lossy context in standard retrieval-based memory by maintaining an exhaustive fallback for fine-grained queries.
  • Experimental results on LoCoMo and RealTalk show the policy achieving an F1 of 53.5 on LoCoMo with GPT-4o-mini, outperforming the Mem0 baseline and recovering 96.7% of full deliberation performance at lower cost.
  • The work demonstrates favorable trade-offs between memory fidelity and compute, suggesting practical efficiency improvements for persistent, self-adapting agents.

Abstract

Driven by the development of persistent, self-adapting autonomous agents, equipping these systems with high-fidelity memory access for long-horizon reasoning has emerged as a critical requirement. However, prevalent retrieval-based memory frameworks often follow an incremental processing paradigm that continuously extracts and updates conversational memories into vector databases, relying on semantic retrieval when queried. While this approach is fast, it inherently relies on lossy abstraction, frequently missing contextually critical information and struggling to resolve queries that rely on fine-grained contextual understanding. To address this, we introduce D-Mem, a dual-process memory system. It retains lightweight vector retrieval for routine queries while establishing an exhaustive Full Deliberation module as a high-fidelity fallback. To achieve cognitive economy without sacrificing accuracy, D-Mem employs a Multi-dimensional Quality Gating policy to dynamically bridge these two processes. Experiments on the LoCoMo and RealTalk benchmarks using GPT-4o-mini and Qwen3-235B-Instruct demonstrate the efficacy of our approach. Notably, our Multi-dimensional Quality Gating policy achieves an F1 score of 53.5 on LoCoMo with GPT-4o-mini. This outperforms our static retrieval baseline, Mem0^\ast (51.2), and recovers 96.7\% of the Full Deliberation's performance (55.3), while incurring significantly lower computational costs.