AI Navigate

Built a local AI agent system with biological architecture — 98/100 on open source benchmark (MacBook M4)

Reddit r/LocalLLaMA / 3/14/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Key Points

  • The post introduces a novel AI agent architecture that uses a single brain with parallel specialized neurons instead of multiple agents communicating sequentially.
  • The architecture consists of Cortex for task classification and model routing, Neurons as specialized units (programmer, designer, database, analyst, researcher), Hippocampus as RAM, Cortex-memory as SQLite-backed long-term memory, Heartbeat as a continuous background process, and Channels for communication (terminal, API, Telegram).
  • Model routing is handled by dedicated models: qwen2.5:0.5b for routing, qwen2.5:1.5b for orchestration, qwen2.5-coder:7b for code/DB/UI neurons, and qwen2.5:14b for reasoning and analysis.
  • Benchmark results on a MacBook Pro M4 (24GB) show 345ms for simple responses, ~40 seconds for a 3-neuron parallel task, and an overall 98/100 across 12 tasks with no API or cloud costs, running locally on Ollama.
  • The author plans to open-source both the agent system and the benchmark in the near future and invites questions.

I've been working on a different approach to AI agents.

Instead of multiple agents talking to each other sequentially,

it's one brain with specialized neurons that activate in

parallel depending on the task.

The architecture:

- Cortex → classifies the task and picks the right model

- Neurons → specialized units (programmer, designer, database, analyst, researcher)

- Hippocampus → short term memory in RAM

- Cortex-memory → long term memory in SQLite

- Heartbeat → autonomous pulse always running in background

- Channels → how you communicate (terminal, API, Telegram)

Model routing is automatic:

- qwen2.5:0.5b for routing (always active, ~200 tok/s)

- qwen2.5:1.5b for orchestration

- qwen2.5-coder:7b for code/DB/UI neurons

- qwen2.5:14b for reasoning and analysis

Benchmark results on MacBook Pro M4 (24GB):

- 345ms for simple responses

- 40s for 3-neuron parallel tasks (frontend + backend + database simultaneously)

- 98/100 overall on 12 tasks across 5 levels

No API costs. No cloud. Everything runs locally on Ollama.

Both the agent system and the benchmark will be fully

open sourced in the near future once I have a more

complete version with additional features.

Will post here when they're ready.

Happy to answer questions about the architecture.

submitted by /u/Kevin_Neamt
[link] [comments]