Back to Basics: Let Conversational Agents Remember with Just Retrieval and Generation

arXiv cs.AI / 4/15/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that conversational memory degradation comes less from complex memory architectures and more from a “Signal Sparsity Effect” that makes relevant information harder to aggregate as dialogues lengthen.
  • It identifies two drivers of failure—Decisive Evidence Sparsity (relevant signals become isolated) and Dual-Level Redundancy (both inter-session interference and intra-session filler add non-informative content).
  • To address this, the authors propose a minimalist framework, using only retrieval and generation, with Turn Isolation Retrieval (TIR) to capture turn-level evidence via max-activation.
  • They further introduce Query-Driven Pruning (QDP) to remove redundant sessions and conversational filler, producing a compact, high-density evidence set for generation.
  • Experiments across multiple benchmarks show the proposed approach outperforms strong baselines while improving token and latency efficiency, presenting a new minimalist baseline for conversational memory.

Abstract

Existing conversational memory systems rely on complex hierarchical summarization or reinforcement learning to manage long-term dialogue history, yet remain vulnerable to context dilution as conversations grow. In this work, we offer a different perspective: the primary bottleneck may lie not in memory architecture, but in the \textit{Signal Sparsity Effect} within the latent knowledge manifold. Through controlled experiments, we identify two key phenomena: \textit{Decisive Evidence Sparsity}, where relevant signals become increasingly isolated with longer sessions, leading to sharp degradation in aggregation-based methods; and \textit{Dual-Level Redundancy}, where both inter-session interference and intra-session conversational filler introduce large amounts of non-informative content, hindering effective generation. Motivated by these insights, we propose \method, a minimalist framework that brings conversational memory back to basics, relying solely on retrieval and generation via Turn Isolation Retrieval (TIR) and Query-Driven Pruning (QDP). TIR replaces global aggregation with a max-activation strategy to capture turn-level signals, while QDP removes redundant sessions and conversational filler to construct a compact, high-density evidence set. Extensive experiments on multiple benchmarks demonstrate that \method achieves robust performance across diverse settings, consistently outperforming strong baselines while maintaining high efficiency in tokens and latency, establishing a new minimalist baseline for conversational memory.