Building Agent Arena: Using Valkey as the Nervous System for Multi-Agent AI

Dev.to / 4/25/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Key Points

  • The project “Agent Arena: Fact or Fake” demonstrates multi-agent coordination in a real-time multiplayer setting by having four autonomous agents collaborate through a shared substrate called Valkey.
  • It targets production needs beyond “LLM intelligence,” including shared state, event-driven handoffs, long-term memory, and observability/recovery under failures.
  • The system is organized around a core design principle of no direct agent-to-agent calls, with all coordination routed through Valkey using JSON keys for state, Pub/Sub for orchestration events, and a vector index (FT.CREATE/FT.SEARCH) for long-term recall.
  • The post explains reusable architecture and developer patterns, including how components like Researcher, Writer, Editor, and Game Master exchange data and how players interact via WebSockets.
  • By structuring agent interactions around Valkey rather than tightly coupled API chains, the approach aims to make multi-agent systems more robust and extensible.

Most AI agent demos prove intelligence. Very few prove coordination.

In this project, we built Agent Arena: Fact or Fake, a real-time multiplayer game where four autonomous agents collaborate through one shared substrate: Valkey.

This post walks through the architecture, implementation details, tradeoffs, and developer patterns you can reuse.

Problem Statement

An LLM can generate content. But production-grade multi-agent systems need more:

  • Shared state across independent workers
  • Event-driven handoffs without tight coupling
  • Long-term memory that informs future behavior
  • Observability and recovery under failure

Without these, agent systems become brittle chains of API calls.

System Overview

Agents in this app:

  • Researcher: generates factual/misleading claim candidates (Ollama)
  • Writer: rewrites claim into player-facing question (Ollama)
  • Editor: validates truth + confidence (OpenAI)
  • Game Master: orchestrates timed rounds, scoring, leaderboard

Players join over WebSocket and answer FACT or FAKE in real time.

Core Design Principle

No direct agent-to-agent calls.

Every handoff is done through Valkey:

  • State -> JSON keys
  • Orchestration -> Pub/Sub events
  • Long-term recall -> vector index (FT.CREATE / FT.SEARCH)

Architecture Diagram

Players (WS) -> FastAPI -> Valkey (JSON + Pub/Sub + Vector)
                                |      |           |
                                v      v           v
                           Researcher Writer     Editor
                                   \      |      /
                                    \     |     /
                                     -> Game Master

Implementation Breakdown

1) Shared State (Valkey JSON)

Agent outputs and game state are written into namespaced keys:

  • game:state:{room_id}
  • agent:researcher:output:{room}
  • agent:writer:draft:{room}
  • agent:editor:review:{room}
  • game:round:{room}:{round}
# backend/services/state_store.py
async def set_game_state(self, room_id: str, state: dict[str, Any]) -> None:
    await self.valkey.set_json(f'game:state:{room_id}', state)

The set_json/get_json layer supports fallback to SET/GET JSON strings if RedisJSON is unavailable, keeping local demos robust.

2) Event-Driven Orchestration (Valkey Pub/Sub)

Every workflow transition publishes an event envelope:

# backend/services/event_bus.py
await self.valkey.publish(channel, envelope.model_dump_json())

Each agent subscribes only to channels it cares about and reacts to events.

This enables:

  • decoupled scaling
  • independent process restarts
  • clean failure boundaries

3) Long-Term Memory (ValkeySearch vectors)

Questions are embedded and stored as memory documents:

# backend/services/vector_memory.py
await self.valkey.set_json(f'memory:question:{round_id}', {
  'question': question,
  'topic': topic,
  'difficulty': difficulty,
  'player_accuracy': player_accuracy,
  'embedding': emb,
})

Vector search retrieves similar prior questions to reduce repetition and improve topic progression.

Round Lifecycle

The event chain per round:

  1. GAME_START / START_ROUND -> Researcher emits RESEARCH_DONE
  2. Writer reacts -> emits DRAFT_READY
  3. Editor reacts -> emits VALIDATION_COMPLETE
  4. Game Master launches round -> emits NEW_QUESTION
  5. Players answer via WebSocket
  6. Game Master emits ROUND_RESULT + LEADERBOARD_UPDATE + ROUND_COMPLETE

We also added cycle_id propagation to guard against stale or duplicate downstream processing.

Reliability Improvements We Added

  • Preserved player score on reconnect (HSETNX)
  • Reset scores on a new game start in the same room
  • Event handler safety: per-event exceptions don’t kill whole agent loop
  • WebSocket payload validation (invalid_json, invalid_round_id, round_not_active)
  • Health endpoint checks Valkey reachability + JSON/Search capability

Operational Checks

Use this before demos:

curl -s http://127.0.0.1:8000/health | jq

You’ll see:

  • reachable
  • json_module
  • search_module

Environment Management (Varlock-first)

The app loads settings from process environment variables, which makes it a good fit for Varlock-managed secrets/config.

Example runtime:

varlock run -- uvicorn main:app --reload --port 8000
varlock run -- python scripts/run_agents.py --agents researcher writer editor game_master

Test Strategy

Minimal integration test included:

pytest -q tests/test_integration_round.py

It validates: start game -> first round -> leaderboard update.

Why This Pattern Matters

Compared with direct API chaining between agents, this design gives:

  • Better fault isolation
  • Better observability
  • Easier horizontal scaling
  • Simpler mental model for distributed workflows

What to Improve Next

  • Move orchestration from Pub/Sub to Valkey Streams (durable delivery)
  • Add event idempotency store + dead-letter handling
  • Add OpenTelemetry traces for event lifecycle
  • Add CI pipeline for contract tests + reliability tests

LLMs provide reasoning, but coordination makes systems reliable.

If you’re building multi-agent workflows, treat Valkey as your shared cognition fabric, not just cache.

Screenshots

Example Output 1

Example Output 2

Example Output 3

Example Output 4

Example Output 5

Example Output 6

Example Output 7

*Github: *https://github.com/harishkotra/neuroloop