EviMem: Evidence-Gap-Driven Iterative Retrieval for Long-Term Conversational Memory

arXiv cs.CL / 5/1/2026

💬 OpinionTools & Practical UsageModels & Research

Key Points

  • The paper introduces EviMem, a method for long-term conversational memory that performs evidence-gap-driven iterative retrieval rather than relying on single-pass or untargeted refinement.
  • EviMem uses a closed-loop framework (IRIS) that evaluates sufficiency to detect what evidence is missing from the current retrieval set, then diagnoses the gap and refines the query accordingly.
  • It also proposes LaceMem, a layered coarse-to-fine memory hierarchy that supports fine-grained diagnosis of evidence gaps across sessions.
  • Experiments on LoCoMo show EviMem improves Judge Accuracy versus MIRIX for temporal questions (73.3% to 81.6%) and multi-hop questions (65.9% to 85.2%) while achieving 4.5× lower latency.
  • The authors provide an implementation via a GitHub repository, enabling replication and further development of the approach.

Abstract

Long-term conversational memory requires retrieving evidence scattered across multiple sessions, yet single-pass retrieval fails on temporal and multi-hop questions. Existing iterative methods refine queries via generated content or document-level signals, but none explicitly diagnoses the evidence gap, namely what is missing from the accumulated retrieval set, leaving query refinement untargeted. We present EviMem, combining IRIS (Iterative Retrieval via Insufficiency Signals), a closed-loop framework that detects evidence gaps through sufficiency evaluation, diagnoses what is missing, and drives targeted query refinement, with LaceMem (Layered Architecture for Conversational Evidence Memory), a coarse-to-fine memory hierarchy supporting fine-grained gap diagnosis. On LoCoMo, EviMem improves Judge Accuracy over MIRIX on temporal (73.3% to 81.6%) and multi-hop (65.9% to 85.2%) questions at 4.5x lower latency. Code: https://github.com/AIGeeksGroup/EviMem.