MemMachine: A Ground-Truth-Preserving Memory System for Personalized AI Agents

arXiv cs.AI / 4/7/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

MemMachine は、LLM エージェントの複数セッションにわたるパーソナライズと長期推論を支えるための「ground-truth（真値）を保つ」メモリシステムとして提案され、会話エピソードを丸ごと保存し、損失的な LLM 抽出を減らす設計が特徴です。
シングルターンの文脈だけでなく、関連証拠が複数ターンにまたがるケースでも思い出し精度を高めるため、文脈付きリトリーバル（nucleus matches を周辺文脈で拡張）を採用しています。
ベンチマークでは LoCoMo で 0.9169（gpt4.1-mini）や LongMemEvalS で 93.0% 精度などの結果を示し、リトリーバル段の最適化（深さ調整・コンテキスト整形・検索プロンプト設計・クエリバイアス補正）が、単純な ingestion（分割など）改善より効いたと報告しています。
さらに、想定ノイズ下での HotpotQA-hard（93.2%）や WikiMultiHop（92.6%）に向けて、Retrieval Agent が問い合わせを直取得・分解・反復型チェーンのいずれかに適応ルーティングする枠組みを組み合わせています。
コスト面でも Mem0 比で入力トークンを約 80% 削減し、最適プロンプトとより小さいモデル（例：GPT-5-mini）で GPT-5 を上回る効率性を示しています。

Abstract

Large Language Model (LLM) agents require persistent memory to maintain personalization, factual continuity, and long-horizon reasoning, yet standard context-window and retrieval-augmented generation (RAG) pipelines degrade over multi-session interactions. We present MemMachine, an open-source memory system that integrates short-term, long-term episodic, and profile memory within a ground-truth-preserving architecture that stores entire conversational episodes and reduces lossy LLM-based extraction. MemMachine uses contextualized retrieval that expands nucleus matches with surrounding context, improving recall when relevant evidence spans multiple dialogue turns. Across benchmarks, MemMachine achieves strong accuracy-efficiency tradeoffs: on LoCoMo it reaches 0.9169 using gpt4.1-mini; on LongMemEvalS (ICLR 2025), a six-dimension ablation yields 93.0 percent accuracy, with retrieval-stage optimizations -- retrieval depth tuning (+4.2 percent), context formatting (+2.0 percent), search prompt design (+1.8 percent), and query bias correction (+1.4 percent) -- outperforming ingestion-stage gains such as sentence chunking (+0.8 percent). GPT-5-mini exceeds GPT-5 by 2.6 percent when paired with optimized prompts, making it the most cost-efficient setup. Compared to Mem0, MemMachine uses roughly 80 percent fewer input tokens under matched conditions. A companion Retrieval Agent adaptively routes queries among direct retrieval, parallel decomposition, or iterative chain-of-query strategies, achieving 93.2 percent on HotpotQA-hard and 92.6 percent on WikiMultiHop under randomized-noise conditions. These results show that preserving episodic ground truth while layering adaptive retrieval yields robust, efficient long-term memory for personalized LLM agents.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 4/7DailyView insight →

Black Hat Asia

AI Business

v0.20.5

Ollama Releases

Inside Anthropic's Project Glasswing: The AI Model That Found Zero-Days in Every Major OS

Dev.to

Gemma 4 26B fabricated an entire code audit. I have the forensic evidence from the database.

Reddit r/LocalLLaMA

SoloEngine: Low-Code Agentic AI Development Platform with Native Support for Multi-Agent Collaboration, MCP, and Skill System

Dev.to

MemMachine: A Ground-Truth-Preserving Memory System for Personalized AI Agents

Key Points

Abstract

💡 Insights using this article

Related Articles

Black Hat Asia

v0.20.5

Inside Anthropic's Project Glasswing: The AI Model That Found Zero-Days in Every Major OS

Gemma 4 26B fabricated an entire code audit. I have the forensic evidence from the database.

SoloEngine: Low-Code Agentic AI Development Platform with Native Support for Multi-Agent Collaboration, MCP, and Skill System

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer