Persistent memory system for LLMs that actually learns mid-conversation

Reddit r/LocalLLaMA / 5/3/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

共有:

Key Points

The article describes “MDA,” a persistent memory approach for LLMs aimed at learning during an ongoing conversation rather than starting from scratch each turn.
MDA represents knowledge as associative entity networks that update in real time using the Oja rule, avoiding backpropagation and reindexing.
Instead of similarity-based retrieval (like standard RAG), MDA retrieves context by activating concept graphs for more connection-based reasoning.
The system is presented as model-agnostic and CPU-first, supports multiple LLM providers out of the box (e.g., Ollama/OpenAI/Anthropic), and is delivered as an MCP server with GPU acceleration for batch workloads.
The author reports synthetic benchmark results where MDA outperforms RAG on overall accuracy and especially on long-turn retention, and claims multiple agents can share a common memory instance for shared reasoning.

Persistent memory system for LLMs that actually learns mid-conversation

Every LLM conversation starts from zero. RAG helps, but it can't learn from what's happening right now. MDA is my attempt to fix that.

MDA encodes knowledge as associative entity networks, updates in real-time via the Oja rule (no backprop, no reindexing), and retrieves context by activating concept graphs rather than similarity search. Runs CPU-first, model-agnostic, works with Ollama/OpenAI/Anthropic out of the box. Available as an MCP server, with GPU acceleration support for batch workloads.

One thing I find genuinely interesting: multiple agents can share the same MDA instance and reason over a common memory. Agent A learns something, Agent B picks it up through associative traversal, not by searching for it, but because the concept network connects them. It starts to feel less like retrieval and more like shared intuition.

https://preview.redd.it/tfo3viz76xyg1.png?width=900&format=png&auto=webp&s=f09d4a3f8a2c0e39316f5a655904d04f6815a401

On the benchmark: these numbers come from synthetic questions I wrote myself, not a community-constructed eval. Take them as directional, not definitive. MDA is not a RAG killer the goal is to cover what RAG and LLMs leave on the table, not replace them. If you run your own tests and find different results, I'd genuinely want to hear it.

	RAG (ChromaDB + bge-large-en-v1.5)	MDA
Overall accuracy	67.5%	82.5%
Context per query	baseline	3.1× less
Retention at 200 turns	0%	92%
A-early (turns 1-10)	70%	80%
B-mid (turns 25-44)	90%	100%
C-late (turns 100-119)	90%	100%
Cross-cluster	20%	50%

Inference model: Qwen3 6-35B-A3B / Judge model: Claude Haiku

If you'd like to share what you think MDA does well or where it falls short, I'd love to hear it.

Source code: https://github.com/rangle2/mda

submitted by /u/One-Pain6799
[link] [comments]

Black Hat USA

AI Business

I used AI to moderate AI content — here's what I learned building AIHallucination

Dev.to

Stop Googling Prompts — Here's the Freelancer AI Toolkit That Actually Works

Dev.to

AI Powered Scheduling for Field Operations by Pablo M. Rivera

Dev.to

AI Deleted My Tests and Said 'All Tests Pass' — A Horror Story from Porting 'typia' from TypeScript to Go

Dev.to

Persistent memory system for LLMs that actually learns mid-conversation

Key Points

Related Articles

Black Hat USA

I used AI to moderate AI content — here's what I learned building AIHallucination

Stop Googling Prompts — Here's the Freelancer AI Toolkit That Actually Works

AI Powered Scheduling for Field Operations by Pablo M. Rivera

AI Deleted My Tests and Said 'All Tests Pass' — A Horror Story from Porting 'typia' from TypeScript to Go

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer