AI Navigate

RAGは決してメモリではなかった。これが本当のメモリだ。

Dev.to / 2026/3/12

📰 ニュースDeveloper Stack & InfrastructureTools & Practical Usage

要点

  • ほとんどのAIメモリシステムは今もテキストをチャンク化し、それを埋め込み、類似性に基づいて後で何かを取得するという同じやり方を続けており、これは文書検索には機能するが、時間の経過とともに人、関係、制約、状態の変化を追跡するには適していない。
  • MINNSは対話からコンテキストグラフを構築し、そのグラフを直接照会できるメモリシステムを提供します。これにより、生のテキスト取得ではなく、マルチホップ推論と状態変化を適切に扱える構造化メモリが実現します。
  • 設定費用が非常に小さい点を強調します。MinnsClientを使って3行のコードでコンテキストの照会を開始するだけで、統合の容易さを示します。
  • このアプローチは、進化する事実(予算の変更や旅程の詳細など)を捉えることで、メモリレイヤーの脆さを解消します。

ほとんどのAIメモリシステムは、いまだに同じことをしています:

  • テキストをチャンク化する
  • それを埋め込む
  • 後で何となく似たものを取り出す
  • モデルにそれを整理してくれることを期待する

それは文書検索には機能します。

メモリが時間とともに人、関係、制約、状態変化を追跡する必要がある場合、これは崩れます。

ユーザーが以下を言う場合:

  • 私の予算は€5,000です
  • 実際には現在€7,000です
  • 私の娘はナッツアレルギーがあります
  • マンチェスター発で出発します

「類似のチャンク」は望んでいません。
現在の状態、結びついた事実、およびそれらを横断して推論する能力を求めています。

それが取得と記憶の間のギャップです。

そしてそれが、MINNSを構築した理由です。

MINNSは、会話からコンテキストグラフを構築し、そのグラフを直接照会できるエージェント用のメモリシステムです。生のテキスト取得ではありません。チャンクの詰め込みでもありません。マルチホップ推論と状態変化を適切に処理できる、構造化されたメモリです。

最大の利点は、設定がいかに少なくて済むかです。

Three lines to get started

import { MinnsClient } from 'minns-sdk';

const client = new MinnsClient({ apiKey: 'your-api-key' });

const answer = await client.query("What's the current trip budget?");

That is the core idea.

Three lines, and you are talking to a memory system designed for actual agent context, not just vector retrieval with better branding.

Why this matters

Most so-called memory layers still behave like search.

They can find a sentence that mentions “budget”.
They can find another sentence that mentions “allergy”.
But they do not naturally understand that:

  • Lily is the daughter
  • Lily has the nut allergy
  • the budget changed
  • the newer budget supersedes the old one
  • Manchester is the departure location for this specific trip context

That is where a lot of agent systems quietly fall apart.

They look good in demos.
They become brittle the second memory has to evolve.

MINNS was built to solve that properly.

You ingest conversations, MINNS extracts structured facts, resolves entities, tracks state transitions, and builds a context graph that your agent can query.

So instead of asking an LLM to reconstruct truth from a pile of text, you query memory as memory.

A minimal example

Let’s say we ingest a short travel-planning conversation.

import { MinnsClient } from 'minns-sdk';

const client = new MinnsClient({
  apiKey: process.env.MINNS_API_KEY!,
});

await client.ingestConversations({
  case_id: 'holiday-planning',
  sessions: [
    {
      session_id: 's1',
      topic: 'trip',
      messages: [
        { role: 'user', content: "I'm planning a trip to the Amalfi Coast for my family." },
        { role: 'user', content: "Budget is about 5000 euros." },
        { role: 'user', content: "Actually, make that 7000 euros." },
        { role: 'user', content: "My daughter Lily is allergic to nuts." },
        { role: 'user', content: "We live in Manchester, so flights from Manchester Airport." },
      ],
    },
  ],
});

Now query it in natural language:

const budget = await client.query("What's the current budget?");
console.log(budget);

const allergy = await client.query('Does anyone have dietary restrictions?');
console.log(allergy);

const flights = await client.query('Where should they fly from?');
console.log(flights);

That is already a very different developer experience from bolting together embeddings, retrieval, filtering, and prompt gymnastics.

Search specific claims too

Sometimes you want direct fact search rather than a composed answer.

You can do that too:

const claims = await client.searchClaims({
  queryText: 'allergies',
});

console.log('Claims:', claims);

So you have both levels:

  • query() for memory-aware answers across the graph
  • searchClaims() for lower-level fact inspection

That is a much better shape for agent systems.

Your LLM can use memory as a tool instead of pretending a prompt full of retrieved chunks is a memory architecture.

Why we built it this way

We built MINNS because too much of the “AI memory” ecosystem still treats memory as a retrieval problem.

It is not.

Real memory has to deal with:

  • entity resolution
  • connected facts
  • evolving user state
  • supersession
  • multi-hop reasoning

If a system cannot handle those reliably, it is not really memory. It is search.

That distinction shows up fast when you benchmark.

MINNS has been benchmarked strongly on multi-hop questions and state-change scenarios, which are exactly the cases where naive retrieval starts to crack. That matters because these are not edge cases. They are the normal shape of user context in any serious agent. A user changes plans, updates constraints, refers to family members indirectly, or asks something that requires multiple connected facts to answer.

That is not exotic. That is Tuesday.

Why this is useful for agents

Once memory is queryable like this, your agent loop gets much simpler.

Instead of:

  • retrieve chunks
  • stuff them into the prompt
  • ask the model to work out what matters
  • hope it prefers the newer fact to the older one

you can do:

  • ask memory directly
  • inspect the returned answer or claims
  • continue reasoning only if needed

That is a much cleaner architecture.

Your LLM becomes the orchestrator.
MINNS becomes the memory layer.
Each part does its own job.

A simple agent pattern

Here is the shape:

const userQuestion = 'What should I know before booking restaurants?';

const memoryAnswer = await client.query(userQuestion);

console.log(memoryAnswer);

If you are building a ReAct-style agent, query() becomes one of the most useful tools in the loop.

The model sees the question, decides it needs memory, calls query(), gets back context-aware output, then either answers or keeps going.

That is far better than making the model reconstruct reality from raw retrieval results.

The setup is the point

A lot of memory tooling asks developers to accept complexity up front:

  • ingestion pipelines
  • retrieval tuning
  • schema work
  • orchestration glue
  • ranking hacks
  • prompt layering to paper over retrieval flaws

MINNS is opinionated in the opposite direction.

The setup should be simple.
The hard part should happen inside the memory system.

That is why this is such a useful starting point:

import { MinnsClient } from 'minns-sdk';

const client = new MinnsClient({ apiKey: 'your-api-key' });

const answer = await client.query("What's the budget?");

Three lines.

And behind those three lines is a memory system built for the exact problems most agent stacks still struggle with.

Final thought

RAG helped AI systems search text.

It did not solve memory.

If you want agents that can handle real user context, especially multi-hop questions and state changes, you need something better than chunk retrieval.

That is what MINNS is for.

Full quickstart

import { MinnsClient } from 'minns-sdk';

const client = new MinnsClient({
  apiKey: process.env.MINNS_API_KEY!,
});

await client.ingestConversations({
  case_id: 'holiday-planning',
  sessions: [
    {
      session_id: 's1',
      topic: 'trip',
      messages: [
        { role: 'user', content: "I'm planning a trip to the Amalfi Coast for my family." },
        { role: 'user', content: "Budget is about 5000 euros." },
        { role: 'user', content: "Actually, make that 7000 euros." },
        { role: 'user', content: "My daughter Lily is allergic to nuts." },
        { role: 'user', content: "We live in Manchester, so flights from Manchester Airport." },
      ],
    },
  ],
});

const budget = await client.query("What's the current budget?");
console.log('Budget:', budget);

const claims = await client.searchClaims({
  queryText: 'allergies',
});
console.log('Claims:', claims);