MemRouter: Memory-as-Embedding Routing for Long-Term Conversational Agents

arXiv cs.AI / 5/4/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces MemRouter, a “write-side” memory routing approach for long-term conversational agents that decides which turns to store without doing autoregressive memory-management generation every turn.
MemRouter uses an embedding-based routing policy: it encodes each turn with recent context, projects embeddings through a frozen LLM backbone, and uses lightweight classification heads trained with only 12M parameters to predict whether to admit the turn to external memory.
In matched-harness experiments on LoCoMo with the retrieval pipeline, prompts, and Q&A backbone (Qwen2.5-7B) held constant, MemRouter improves overall F1 to 52.0 from 45.6 versus an LLM-based memory manager, with results reported as statistically non-overlapping at 95% confidence intervals.
MemRouter also substantially reduces memory-management latency, cutting p50 from 970ms to 58ms, while additional ablations show that learned admission provides the largest gains, followed by category-specific prompting and retrieval.
The work supports a modular design for long-horizon conversational QA, suggesting that memory admission can be optimized via a small supervised router while answer generation stays as a separate downstream component.

Abstract

Long-term conversational agents must decide which turns to store in external memory, yet recent systems rely on autoregressive LLM generation at every turn to make that decision. We present MemRouter, a write-side memory router that decouples memory admission from the downstream answer backbone and replaces per-turn memory-management decoding with an embedding-based routing policy. MemRouter encodes each turn together with recent context, projects the resulting embeddings through a frozen LLM backbone, and predicts whether the turn should be stored using lightweight classification heads while training only 12M parameters. Under a controlled matched-harness comparison on LoCoMo, where the retrieval pipeline, answer prompts, and QA backbone (Qwen2.5-7B) are held identical, MemRouter outperforms an LLM-based memory manager on every question category (overall F1 52.0 vs 45.6, non-overlapping 95% CIs) while reducing memory-management p50 latency from 970ms to 58ms. Descriptive factorial averaging further shows that learned admission improves mean F1 by +10.3 over random storage, category-specific prompting adds +5.2 over a generic prompt, and retrieval contributes +0.7. These results suggest that write-side memory admission can be learned by a small supervised router, while answer generation remains a separate downstream component in long-horizon conversational QA.