I ran 11 AI agents for 2 months. Memory wasn't the bottleneck - identity was.

Reddit r/artificial / 4/27/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • The author found that with 11 persistent local AI agents, the biggest debugging cost came from identity/context mix-ups rather than insufficient memory or context length.
  • Even when an agent produced technically correct outputs, it could still fail by acting in the wrong context—answering irrelevant questions or performing out-of-scope actions.
  • The proposed fix was to separate identity from memory by using three per-agent JSON files: passport.json for stable role/purpose/principles, local.json for capped session history, and observations.json for concrete operational findings.
  • The author reports that loading identity first, then memory, then observations improved stability, and that ambiguous identity should trigger loud failures (silence is safer than wrong actions).
  • Scaling to 11 agents worked because the system injected project-specific context via hooks, automatically archived memory when full, and used planning to keep agents’ scopes narrow.

Everyone's building memory layers right now. Longer context, better embeddings, persistent state across sessions. I spent weeks on the same thing.

But the failure mode that actually cost me the most debugging time had nothing to do with memory.

Here's what it looked like: an agent would be technically correct - good reasoning, clean output - but operating from the wrong context entirely. Answering questions nobody asked. Taking actions outside its scope. Not hallucinating. Drifting. Like a competent person who walked into the wrong meeting and started contributing without realizing they're in the wrong room.

I run 11 persistent agents locally. Each one is a domain specialist - its entire life is one thing. The mail agent's every session, every test, every bug fix is about routing messages. The standards auditor's whole existence is quality checks. They're not generic workers configured for a task. They've each accumulated dozens of sessions of operational history in their domain, and that history is what makes them good at their job.

When they started drifting, my first instinct was what everyone's instinct is: better memory. More context. None of it helped. An agent with perfect recall of its last 50 sessions would still lose track of who it was in session 51.

What actually fixed it

I separated identity from memory entirely. Three files per agent:

passport.json - who you are. Role, purpose, principles. Rarely changes. This is the anchor.

local.json - what happened. Rolling session history, key learnings. Capped and trimmed when it fills up.

observations.json - what you've noticed about the humans and agents you work with. Concrete stuff like "the git agent needs 2 retries on large diffs" or "quality audits overcorrect on technical claims." The agent writes these itself based on what actually happens.

Identity loads first, then memory, then observations. That ordering matters. When the identity file loads first, the agent has a stable reference point before any history lands.

The mail routing agent learned the sharpest version of this. When identity was ambiguous, it would route messages from the wrong sender. The fix wasn't better routing logic - it was: fail loud when identity is unclear. Wrong identity is worse than silence.

The files alone weren't enough

Three JSON files helped, but didn't scale past a few agents. What actually made 11 work is that none of them need to understand the full system. Hooks inject context automatically every session - project rules, branch instructions, current plan. One command reaches any agent. Memory auto-archives when it fills up. Plans keep work focused so agents don't carry their entire history in context.

The system learned from failing. The agents communicate through a local email system - they send each other tasks, status updates, bug reports. One agent monitors all logs for errors. When it spots something, it emails the agent who owns that domain and wakes them up to investigate. The agents fix each other. The memory agent iterated three sessions to fix a single rollover boundary condition - each time it shipped, observed a new edge case, and improved. These aren't cold modules. They break, they help each other fix it, they get better. That's how the system got to where it is.

You don't need 11 agents

The 11 agents in my setup maintain the framework itself. That's the reference implementation. But u could start with one agent on a side project - just identity and memory, pick up where u left off tomorrow. Need a team? Add a backend agent, a frontend agent, a design researcher. Three agents, same pattern, same commands. Or scale to 30 for a bigger system. Each new agent is one command and the same structure.

What this doesn't solve

This all runs locally on one machine. I don't know whether identity drift looks the same in hosted environments. If u run stateless agents behind an API, the problem might not exist for you.

Small project, small community, growing. The pattern itself is small enough to steal - three JSON files and a convention. But the system that keeps agents coherent at scale is where the real work went.

pip install aipass and two commands to get a working agent. The .trinity/ directory is the identity layer.

Has anyone else tried separating identity from memory in their agent setups? Curious whether the ordering matters in other architectures, or if it's just an artifact of how this system evolved.

submitted by /u/Input-X
[link] [comments]