Memori: A Persistent Memory Layer for Efficient, Context-Aware LLM Agents

arXiv cs.LG / 3/23/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

Memori provides an LLM-agnostic persistent memory layer that avoids vendor lock-in and large prompt injections by storing memory as structured representations.
It uses an Advanced Augmentation pipeline to convert unstructured dialogue into compact semantic triples and conversation summaries for precise retrieval and coherent reasoning.
On the LoCoMo benchmark, Memori achieves 81.95% accuracy and uses about 1,294 tokens per query, roughly 5% of full context, yielding substantial efficiency gains.
The approach reports around 67% fewer tokens than competing methods and over 20x savings versus full-context methods, highlighting cost reductions.
The work argues that effective memory for LLM agents relies on structured representations rather than simply expanding context windows, enabling scalable deployment across multi-session interactions.

Abstract

As large language models (LLMs) evolve into autonomous agents, persistent memory at the API layer is essential for enabling context-aware behavior across LLMs and multi-session interactions. Existing approaches force vendor lock-in and rely on injecting large volumes of raw conversation into prompts, leading to high token costs and degraded performance. We introduce Memori, an LLM-agnostic persistent memory layer that treats memory as a data structuring problem. Its Advanced Augmentation pipeline converts unstructured dialogue into compact semantic triples and conversation summaries, enabling precise retrieval and coherent reasoning. Evaluated on the LoCoMo benchmark, Memori achieves 81.95% accuracy, outperforming existing memory systems while using only 1,294 tokens per query (~5% of full context). This results in substantial cost reductions, including 67% fewer tokens than competing approaches and over 20x savings compared to full-context methods. These results show that effective memory in LLM agents depends on structured representations instead of larger context windows, enabling scalable and cost-efficient deployment.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 3/23DailyView insight →

I built an online background remover and learned a lot from launching it

Dev.to

ShieldCortex: What We Learned Protecting AI Agent Memory

Dev.to

Why Your SaaS Needs AI Chat in 2026 (Add It in 40 Lines)

Dev.to

[D] Matryoshka Representation Learning

Reddit r/MachineLearning

Advancing Open Source AI, NVIDIA Donates Dynamic Resource Allocation Driver for GPUs to Kubernetes Community

Nvidia AI Blog

Memori: A Persistent Memory Layer for Efficient, Context-Aware LLM Agents

Key Points

Abstract

💡 Insights using this article

Related Articles

I built an online background remover and learned a lot from launching it

ShieldCortex: What We Learned Protecting AI Agent Memory

Why Your SaaS Needs AI Chat in 2026 (Add It in 40 Lines)

[D] Matryoshka Representation Learning

Advancing Open Source AI, NVIDIA Donates Dynamic Resource Allocation Driver for GPUs to Kubernetes Community

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer