One Open Source Project a Day (No.43): MiroFish - Predicting the Future with Swarm Intelligence

Dev.to / 4/20/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsIdeas & Deep AnalysisTools & Practical UsageModels & Research

Read original →

共有:

Key Points

MiroFish is an open-source swarm-intelligence prediction engine that simulates thousands of AI agents to evolve group behavior and derive forecast trajectories from emergent dynamics rather than static models.
It frames prediction as generating “stories” and structured trend reports by re-enacting how a crowd might respond, avoiding limitations of conventional data→train→number pipelines.
The project’s learning topics include a five-stage simulation pipeline (from knowledge-graph construction to interactive reporting) and how GraphRAG injects domain knowledge into agents.
It also highlights persistent cross-session memory via Zep Cloud and a “god-mode” variable-injection mechanism for runtime what-if analysis.
With 56k+ stars and 8.6k+ forks, the author (666ghj) positions MiroFish as part of a broader pipeline, complementing another project (BettaFish) to support end-to-end data collection and simulation-based prediction.

Introduction

"Don't predict individuals — simulate the swarm."

This is article No.43 in the "One Open Source Project a Day" series. Today's project is MiroFish (GitHub).

The standard playbook for prediction tools is: collect data → train a model → output a number. But this approach has a fundamental blind spot: models are static, while the real world emerges from dynamic interaction. Public sentiment, market behavior, policy responses — these are collective phenomena that emerge from countless individual interactions. You can't fit a public opinion storm with linear regression.

MiroFish takes a different approach entirely: instead of fitting, it re-enacts. By running thousands of virtual "people" through a simulated platform, it lets group behavior evolve naturally and extracts the trajectory as a prediction. This isn't predicting numbers — it's predicting stories.

56k+ Stars, 8.6k+ Forks — one of the most-watched projects in the multi-agent simulation space. The author is 666ghj (BaiFu), a student at Beijing University of Posts and Telecommunications with support from Shanda Group, and also the creator of BettaFish (40.5k Stars). Together, the two projects form a complete "data collection → simulation prediction" pipeline.

What You'll Learn

MiroFish's core philosophy: why swarm simulation is closer to truth than statistical prediction
The five-stage simulation pipeline: from knowledge graph construction to deep interactive reporting
GraphRAG's role in simulation: injecting domain knowledge into agents
Zep Cloud cross-session memory: giving agents a persistent sense of history
"God-mode" variable injection: runtime what-if analysis

Prerequisites

Basic understanding of Multi-Agent Systems (MAS)
Python basics (optional, for understanding configuration logic)
Interest in public opinion analysis or trend forecasting

Project Background

What Is It?

MiroFish is a swarm intelligence prediction engine that builds a virtual society populated by thousands of AI agents, simulates how a real crowd would respond to a given topic, and generates a structured trend forecast report from the emergent behavior.

The name is inspired by the emergent phenomenon of collective intelligence — just as a school of fish (Fish) can form patterns far exceeding individual capability, human social dynamics emerge from individual interactions into collective outcomes that no individual planned.

The core problem it solves:

Traditional approach:
  Historical data → Statistical model → "Next month's sales will be X"
  Problem: Can't explain why; can't handle black-swan events

MiroFish approach:
  Seed knowledge + Multi-agent simulation → Group behavior evolution
  → "Under these conditions, the crowd will respond like this"
  Advantage: Explainable, intervable, supports what-if analysis

About the Author

GitHub: 666ghj
Background: Student at Beijing University of Posts and Telecommunications; supported by Shanda Group
Sister project: BettaFish (40.5k ⭐) — collects sentiment data from 30+ platforms, feeding into MiroFish as the data source
Vision: Complete loop: BettaFish collects → MiroFish simulates and predicts

Project Stats

⭐ GitHub Stars: 56,400+
🍴 Forks: 8,600+
📦 Latest Version: v0.1.2
📄 License: AGPL-3.0 (copyleft; SaaS deployments must open-source modifications)
🌐 Language breakdown: Python 57.6% + Vue.js 41.2%
🤝 Core dependencies: CAMEL-AI OASIS, camel-ai 0.2.78, Zep Cloud 3.13.0, GraphRAG, PyMuPDF

Key Features

The Core: Five-Stage Simulation Pipeline

MiroFish runs through five strictly sequential stages:

Stage 1: Graph Building
  Seed documents (PDF/URL) → PyMuPDF extraction → GraphRAG knowledge graph
  Output: Domain knowledge graph (entities + relationships)

Stage 2: Environment Setup
  CAMEL-AI OASIS initializes virtual platforms
  Each agent is assigned a Zep Cloud long-term memory
  Knowledge graph context injected into agent context

Stage 3: Parallel Simulation
  Dual-platform simultaneous run (for confidence validation)
  Thousands of agents interact across N rounds
  Agent behavior driven by LLM + constrained by persona profiles

Stage 4: Report Generation
  Aggregate simulation trajectories
  LLM summarizes collective behavior patterns
  Generate structured trend report

Stage 5: Deep Interaction
  User can ask natural-language questions about the report
  Supports "God-mode" variable injection (runtime what-if)
  RAG retrieves simulation records to answer questions

GraphRAG: Injecting Domain Knowledge

MiroFish uses GraphRAG instead of standard RAG — and the reason is intuitive:

# Standard RAG: document → vector → retrieve similar chunks
# Problem: can only answer "what are the facts," can't reason about relationships

# GraphRAG: document → entity extraction → relationship graph → graph traversal
# Advantage: can reason "A affects B, what does B do to C?"

In simulation, agents need to understand complex causal chains (e.g., "Policy X influences Group Y's behavior"). GraphRAG's graph structure handles this kind of relational reasoning far better than vector retrieval alone.

Zep Cloud: Agents That Remember

Every agent gets its own Zep Cloud memory space:

# Each agent has persistent memory
agent_memory = ZepMemory(
    session_id=f"agent_{agent_id}",
    zep_client=zep_client
)

# Between simulation rounds, agents can "remember" prior interactions
# This makes agent behavior coherent and more true to real human cognition

This solves a classic problem in multi-agent simulation: if agents "forget" everything between rounds, their behavior lacks continuity and prediction credibility drops significantly.

"God-Mode" Variable Injection

One of MiroFish's most distinctive features — injecting external variables at runtime:

# Example: "What if a competitor suddenly cuts prices by 20%?"
god_mode_injection = {
    "event": "competitor_price_cut",
    "magnitude": -0.20,
    "timing": "round_15",
    "affected_agents": "all_consumer_agents"
}

# After injection, the simulation responds in real-time to this "external shock"
# Output: Group sentiment shift + behavioral evolution trajectory

This transforms MiroFish from a "what will happen" tool into a "what happens if I do this" decision support tool.

Dual-Platform Parallel Simulation

MiroFish runs the same simulation on two independent virtual platforms simultaneously:

Platform A ─── Results A ──┐
                            ├── Confidence scoring + combined report
Platform B ─── Results B ──┘

Both platforms converge → high-confidence conclusion
Platforms diverge → flagged as "uncertainty zone," users cautioned

BettaFish + MiroFish: The Complete Pipeline

[BettaFish]                        [MiroFish]
Weibo / Twitter / Reddit / ...  →  Seed data → Knowledge graph
Sentiment data collection       →  Agent initialization
30+ platforms                   →  Simulation → Report → Prediction

Together, the two projects form a full chain: from real-world data collection to future trend forecasting.

Quick Start

Requirements: Python 3.10+, Node.js 16+, Docker (recommended)

# Clone
git clone https://github.com/666ghj/MiroFish.git
cd MiroFish

# Configure environment
cp .env.example .env
# Edit .env with:
# - OPENAI_API_KEY (or compatible API)
# - ZEP_API_KEY (Zep Cloud account)
# - GRAPHRAG_API_KEY

# Option 1: Docker one-command start (recommended)
docker-compose up -d

# Option 2: Manual startup
pip install -r requirements.txt
cd frontend && npm install && npm run build && cd ..
python app.py

Visit http://localhost:5000 to access the web UI.

Your first simulation:

1. Upload seed documents (PDF or enter URLs)
2. Configure simulation parameters (agent count, rounds, topic)
3. Click "Start Simulation"
4. Wait ~10-30 minutes (depends on agent count and API speed)
5. Review the generated trend report; ask follow-up questions in natural language

Deep Dive

System Architecture

┌──────────────────────────────────────────────────────┐
│               Frontend Layer (Vue.js)                │
│  Config / Progress Monitor / Report Viewer / Chat    │
└──────────────────────┬───────────────────────────────┘
                       │ REST API
┌──────────────────────▼───────────────────────────────┐
│             Backend Layer (Flask + Python)           │
│  Five-stage pipeline orchestration / God-mode ctrl  │
└──────┬───────────────┬─────────────────┬─────────────┘
       │               │                 │
┌──────▼──────┐  ┌─────▼──────┐  ┌──────▼─────────────┐
│ Knowledge   │  │ Simulation  │  │  Memory Layer      │
│   Layer     │  │   Layer     │  │  Zep Cloud         │
│  GraphRAG   │  │  CAMEL-AI   │  │  Per-agent memory  │
│  Knowledge  │  │  OASIS      │  │  Cross-round       │
│  graph      │  │  Dual-plat  │  │  persistence       │
│  PyMuPDF    │  │  parallel   │  └────────────────────┘
└─────────────┘  └────────────┘

CAMEL-AI OASIS: The Simulation Engine

CAMEL-AI OASIS is MiroFish's simulation core, purpose-built for social simulation:

from oasis import Environment, Agent, Platform

# Initialize virtual platform
platform = Platform(
    name="simulated_weibo",
    max_agents=5000,
    interaction_rules=InteractionConfig(
        max_posts_per_round=10,
        follow_probability=0.3
    )
)

# Create agents with varied personas
agents = [
    Agent(
        id=i,
        persona=PersonaProfile(
            age=random.randint(18, 65),
            occupation=random.choice(occupations),
            political_lean=random.gauss(0, 1),
            activity_level=random.uniform(0.1, 1.0)
        ),
        memory=ZepMemory(session_id=f"agent_{i}"),
        knowledge_context=graphrag_context
    )
    for i in range(5000)
]

# Launch simulation
env = Environment(platform=platform, agents=agents)
results = env.run(rounds=50, topic="EV subsidy policy")

GraphRAG Knowledge Graph Construction

from graphrag import GraphRAGPipeline

# Build knowledge graph from seed documents
pipeline = GraphRAGPipeline(
    input_dir="./seed_documents",
    output_dir="./knowledge_graph"
)

pipeline.run()
# Example output:
# Entities: [Tesla, BYD, Subsidy Policy, Consumer Groups, ...]
# Relations: [Tesla → competes → BYD]
#             [Subsidy Policy → stimulates → consumer purchase intent]
#             [Purchase intent → drives → market share]

# Inject knowledge graph into agent context
context = pipeline.query("Key factors in the EV market")

Report Generation: From Trajectories to Insights

def generate_report(simulation_results):
    """Generate trend report from simulation trajectories"""

    # 1. Aggregate statistics
    sentiment_evolution = aggregate_sentiment(simulation_results)
    opinion_clusters = cluster_opinions(simulation_results)
    key_events = detect_tipping_points(simulation_results)

    # 2. LLM synthesis
    report_prompt = f"""
    Based on the following simulation data, generate a trend analysis report:
    - Sentiment evolution curve: {sentiment_evolution}
    - Opinion clusters: {opinion_clusters}
    - Key tipping points: {key_events}

    Analyze: collective final stance, key driving factors, likely evolution paths
    """

    report = llm.complete(report_prompt)

    # 3. Dual-platform consistency check
    confidence = calculate_confidence(
        results_platform_a=simulation_results["platform_a"],
        results_platform_b=simulation_results["platform_b"]
    )

    return Report(content=report, confidence=confidence)

Frontend: Vue.js Visualization

The Vue.js frontend provides a full visualization suite:

Simulation progress monitor: Real-time stage progress and agent activity levels
Knowledge graph explorer: Interactive entity-relationship graph from GraphRAG
Sentiment heatmap: Group emotional distribution evolving across simulation rounds
Report reader: Highlighted, annotatable report with natural language follow-up chat

Why AGPL-3.0?

AGPL-3.0 is stricter than GPL in a specific way:

MIT/Apache:  Modifications can stay closed source (can commercialize without sharing)
GPL:         Distributions of modified code must be open-sourced
AGPL:        Even network services (SaaS) using modified code must open-source modifications

→ MiroFish with AGPL means: if you build a cloud service on it, you must open-source
  your changes. Personal research and local use have no restrictions.

Resources

Official

🌟 GitHub: https://github.com/666ghj/MiroFish
🐟 Sister project: BettaFish (sentiment collection, 40.5k ⭐)
📄 OASIS framework: https://github.com/camel-ai/oasis
🧠 Zep Cloud: https://www.getzep.com
📊 GraphRAG: https://github.com/microsoft/graphrag

Related Technologies

CAMEL-AI: https://github.com/camel-ai/camel — multi-agent framework foundation
PyMuPDF: PDF document parsing
Docker: Recommended deployment method

Summary

Key Takeaways

Swarm simulation replaces statistical fitting: MiroFish doesn't fit historical data — it re-enacts group interaction. The output is a behavioral evolution story, not a point prediction
GraphRAG knowledge enhancement: Graph-structured knowledge representation gives agents domain-level causal reasoning, not just fact retrieval
Zep Cloud cross-round memory: Solves simulation continuity, making agent behavior authentically follow human cognitive patterns
God-mode what-if analysis: Runtime variable injection turns a prediction tool into a decision support tool
Dual-platform confidence mechanism: Consistency-based confidence scoring is more honest about uncertainty than a single prediction
BettaFish + MiroFish closed loop: Complete pipeline from real-world data collection to future trend forecasting

Who Should Use This

Researchers: Scholars in public opinion dynamics, social simulation, and computational social science
Product managers / market analysts: Decision-makers who need to anticipate market reactions and run scenario analysis
AI engineers: Developers studying multi-agent simulation architectures, GraphRAG applications, and agent memory systems
Independent researchers: Explorers interested in swarm intelligence emergence and using AI to understand complex social systems

A Question Worth Sitting With

MiroFish represents a new paradigm for AI prediction: from "model fitting" to "world simulation". When we re-enact the real world with multi-agent systems, we're asking a deeper question — reality is fundamentally the emergent result of countless agents in dynamic equilibrium. Perhaps truly understanding that emergence requires simulation of comparable complexity.

A school of fish has no central director, yet forms spectacular collective patterns. Is the trajectory of public opinion shaped by the same invisible hand?

Visit my personal site for more useful knowledge and interesting products

Black Hat USA

AI Business

Black Hat Asia