Building Agent Arena: Using Valkey as the Nervous System for Multi-Agent AI

Dev.to / 4/25/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

共有:

Key Points

The project “Agent Arena: Fact or Fake” demonstrates multi-agent coordination in a real-time multiplayer setting by having four autonomous agents collaborate through a shared substrate called Valkey.
It targets production needs beyond “LLM intelligence,” including shared state, event-driven handoffs, long-term memory, and observability/recovery under failures.
The system is organized around a core design principle of no direct agent-to-agent calls, with all coordination routed through Valkey using JSON keys for state, Pub/Sub for orchestration events, and a vector index (FT.CREATE/FT.SEARCH) for long-term recall.
The post explains reusable architecture and developer patterns, including how components like Researcher, Writer, Editor, and Game Master exchange data and how players interact via WebSockets.
By structuring agent interactions around Valkey rather than tightly coupled API chains, the approach aims to make multi-agent systems more robust and extensible.

Most AI agent demos prove intelligence. Very few prove coordination.

In this project, we built Agent Arena: Fact or Fake, a real-time multiplayer game where four autonomous agents collaborate through one shared substrate: Valkey.

This post walks through the architecture, implementation details, tradeoffs, and developer patterns you can reuse.

Problem Statement

An LLM can generate content. But production-grade multi-agent systems need more:

Shared state across independent workers
Event-driven handoffs without tight coupling
Long-term memory that informs future behavior
Observability and recovery under failure

Without these, agent systems become brittle chains of API calls.

System Overview

Agents in this app:

Researcher: generates factual/misleading claim candidates (Ollama)
Writer: rewrites claim into player-facing question (Ollama)
Editor: validates truth + confidence (OpenAI)
Game Master: orchestrates timed rounds, scoring, leaderboard

Players join over WebSocket and answer FACT or FAKE in real time.

Core Design Principle

No direct agent-to-agent calls.

Every handoff is done through Valkey:

State -> JSON keys
Orchestration -> Pub/Sub events
Long-term recall -> vector index (FT.CREATE / FT.SEARCH)

Architecture Diagram

Players (WS) -> FastAPI -> Valkey (JSON + Pub/Sub + Vector)
                                |      |           |
                                v      v           v
                           Researcher Writer     Editor
                                   \      |      /
                                    \     |     /
                                     -> Game Master

Implementation Breakdown

1) Shared State (Valkey JSON)

Agent outputs and game state are written into namespaced keys:

game:state:{room_id}
agent:researcher:output:{room}
agent:writer:draft:{room}
agent:editor:review:{room}
game:round:{room}:{round}

# backend/services/state_store.py
async def set_game_state(self, room_id: str, state: dict[str, Any]) -> None:
    await self.valkey.set_json(f'game:state:{room_id}', state)

The set_json/get_json layer supports fallback to SET/GET JSON strings if RedisJSON is unavailable, keeping local demos robust.

2) Event-Driven Orchestration (Valkey Pub/Sub)

Every workflow transition publishes an event envelope:

# backend/services/event_bus.py
await self.valkey.publish(channel, envelope.model_dump_json())

Each agent subscribes only to channels it cares about and reacts to events.

This enables:

decoupled scaling
independent process restarts
clean failure boundaries

3) Long-Term Memory (ValkeySearch vectors)

Questions are embedded and stored as memory documents:

# backend/services/vector_memory.py
await self.valkey.set_json(f'memory:question:{round_id}', {
  'question': question,
  'topic': topic,
  'difficulty': difficulty,
  'player_accuracy': player_accuracy,
  'embedding': emb,
})

Vector search retrieves similar prior questions to reduce repetition and improve topic progression.

Round Lifecycle

The event chain per round:

GAME_START / START_ROUND -> Researcher emits RESEARCH_DONE
Writer reacts -> emits DRAFT_READY
Editor reacts -> emits VALIDATION_COMPLETE
Game Master launches round -> emits NEW_QUESTION
Players answer via WebSocket
Game Master emits ROUND_RESULT + LEADERBOARD_UPDATE + ROUND_COMPLETE

We also added cycle_id propagation to guard against stale or duplicate downstream processing.

Reliability Improvements We Added

Preserved player score on reconnect (HSETNX)
Reset scores on a new game start in the same room
Event handler safety: per-event exceptions don’t kill whole agent loop
WebSocket payload validation (invalid_json, invalid_round_id, round_not_active)
Health endpoint checks Valkey reachability + JSON/Search capability

Operational Checks

Use this before demos:

curl -s http://127.0.0.1:8000/health | jq

You’ll see:

reachable
json_module
search_module

Environment Management (Varlock-first)

The app loads settings from process environment variables, which makes it a good fit for Varlock-managed secrets/config.

Example runtime:

varlock run -- uvicorn main:app --reload --port 8000
varlock run -- python scripts/run_agents.py --agents researcher writer editor game_master

Test Strategy

Minimal integration test included:

pytest -q tests/test_integration_round.py

It validates: start game -> first round -> leaderboard update.

Why This Pattern Matters

Compared with direct API chaining between agents, this design gives:

Better fault isolation
Better observability
Easier horizontal scaling
Simpler mental model for distributed workflows

What to Improve Next

Move orchestration from Pub/Sub to Valkey Streams (durable delivery)
Add event idempotency store + dead-letter handling
Add OpenTelemetry traces for event lifecycle
Add CI pipeline for contract tests + reliability tests

LLMs provide reasoning, but coordination makes systems reliable.

If you’re building multi-agent workflows, treat Valkey as your shared cognition fabric, not just cache.

Screenshots

*Github: *https://github.com/harishkotra/neuroloop

Black Hat USA

AI Business

Why don't Automatic speech Recognition models use prompting? [D]

Reddit r/MachineLearning

Got into the Anthropic Claude Partner Network — have spots for people who want CCAF cert access

Reddit r/artificial

💎 Daily B2B Lead Report: Who's Hiring Now? (2026-04-25)

Dev.to

Automating Advanced Customization in Your Music Studio