GRAVITY: Architecture-Agnostic Structured Anchoring for Long-Horizon Conversational Memory

arXiv cs.CL / 5/5/2026

📰 NewsModels & Research

Key Points

  • The GRAVITY method introduces a plug-and-play structured memory module for long-horizon conversational agents, aiming to add relational, temporal, and thematic structure to retrieved context.
  • It derives three representations from raw dialogue: entity profiles using relational graphs, temporal event tuples organized into causal traces, and cross-session topic summaries.
  • During generation, GRAVITY injects these representations into the host model’s prompt as structured “anchoring” contexts without requiring any changes to the host model architecture.
  • Experiments on LongMemEval and LoCoMo using five different memory systems show consistent improvements, with an average 7.5–10.1% gain in LLM-judge accuracy.
  • The performance gains are larger for weaker baselines (about 12.2% for the weakest host) and smaller but still positive for strong baselines (3.8–5.7%), suggesting broad applicability.

Abstract

Long-horizon conversational agents rely on memory systems with increasingly sophisticated retrieval mechanisms. However, retrieved fragments are typically fed to the language model as unstructured text, lacking the relational, temporal, and thematic structures essential for complex reasoning. To bridge this reasoning gap, we introduce GRAVITY (\textbf{G}eneration-time \textbf{R}elational \textbf{A}nchoring \textbf{V}ia \textbf{I}njected \textbf{T}opological Memor\textbf{Y}), a plug-and-play structured memory module. GRAVITY extracts three complementary knowledge representations from raw conversational utterances: entity profiles grounded in relational graphs, temporal event tuples linked into causal traces, and cross-session topic summaries. At generation time, it injects these representations into the host system's prompt as structured anchoring contexts. This approach effectively synthesizes scattered evidence into a coherent, query-relevant context without requiring any architectural modifications to the host model. Extensive evaluations across five diverse memory systems on the LongMemEval and LoCoMo benchmarks demonstrate the efficacy of our approach. On average, GRAVITY improves LLM-judge accuracy by 7.5--10.1%. Gains are inversely correlated with baseline strength: the weakest host improves by 12.2% while the strongest still gains 3.8--5.7%. These findings establish structured context anchoring as a broadly effective, architecture-agnostic augmentation paradigm for long-horizon conversational memory.