Introduction
"Don't predict individuals — simulate the swarm."
This is article No.43 in the "One Open Source Project a Day" series. Today's project is MiroFish (GitHub).
The standard playbook for prediction tools is: collect data → train a model → output a number. But this approach has a fundamental blind spot: models are static, while the real world emerges from dynamic interaction. Public sentiment, market behavior, policy responses — these are collective phenomena that emerge from countless individual interactions. You can't fit a public opinion storm with linear regression.
MiroFish takes a different approach entirely: instead of fitting, it re-enacts. By running thousands of virtual "people" through a simulated platform, it lets group behavior evolve naturally and extracts the trajectory as a prediction. This isn't predicting numbers — it's predicting stories.
56k+ Stars, 8.6k+ Forks — one of the most-watched projects in the multi-agent simulation space. The author is 666ghj (BaiFu), a student at Beijing University of Posts and Telecommunications with support from Shanda Group, and also the creator of BettaFish (40.5k Stars). Together, the two projects form a complete "data collection → simulation prediction" pipeline.
What You'll Learn
- MiroFish's core philosophy: why swarm simulation is closer to truth than statistical prediction
- The five-stage simulation pipeline: from knowledge graph construction to deep interactive reporting
- GraphRAG's role in simulation: injecting domain knowledge into agents
- Zep Cloud cross-session memory: giving agents a persistent sense of history
- "God-mode" variable injection: runtime what-if analysis
Prerequisites
- Basic understanding of Multi-Agent Systems (MAS)
- Python basics (optional, for understanding configuration logic)
- Interest in public opinion analysis or trend forecasting
Project Background
What Is It?
MiroFish is a swarm intelligence prediction engine that builds a virtual society populated by thousands of AI agents, simulates how a real crowd would respond to a given topic, and generates a structured trend forecast report from the emergent behavior.
The name is inspired by the emergent phenomenon of collective intelligence — just as a school of fish (Fish) can form patterns far exceeding individual capability, human social dynamics emerge from individual interactions into collective outcomes that no individual planned.
The core problem it solves:
Traditional approach:
Historical data → Statistical model → "Next month's sales will be X"
Problem: Can't explain why; can't handle black-swan events
MiroFish approach:
Seed knowledge + Multi-agent simulation → Group behavior evolution
→ "Under these conditions, the crowd will respond like this"
Advantage: Explainable, intervable, supports what-if analysis
About the Author
- GitHub: 666ghj
- Background: Student at Beijing University of Posts and Telecommunications; supported by Shanda Group
- Sister project: BettaFish (40.5k ⭐) — collects sentiment data from 30+ platforms, feeding into MiroFish as the data source
- Vision: Complete loop: BettaFish collects → MiroFish simulates and predicts
Project Stats
- ⭐ GitHub Stars: 56,400+
- 🍴 Forks: 8,600+
- 📦 Latest Version: v0.1.2
- 📄 License: AGPL-3.0 (copyleft; SaaS deployments must open-source modifications)
- 🌐 Language breakdown: Python 57.6% + Vue.js 41.2%
- 🤝 Core dependencies: CAMEL-AI OASIS, camel-ai 0.2.78, Zep Cloud 3.13.0, GraphRAG, PyMuPDF
Key Features
The Core: Five-Stage Simulation Pipeline
MiroFish runs through five strictly sequential stages:
Stage 1: Graph Building
Seed documents (PDF/URL) → PyMuPDF extraction → GraphRAG knowledge graph
Output: Domain knowledge graph (entities + relationships)
Stage 2: Environment Setup
CAMEL-AI OASIS initializes virtual platforms
Each agent is assigned a Zep Cloud long-term memory
Knowledge graph context injected into agent context
Stage 3: Parallel Simulation
Dual-platform simultaneous run (for confidence validation)
Thousands of agents interact across N rounds
Agent behavior driven by LLM + constrained by persona profiles
Stage 4: Report Generation
Aggregate simulation trajectories
LLM summarizes collective behavior patterns
Generate structured trend report
Stage 5: Deep Interaction
User can ask natural-language questions about the report
Supports "God-mode" variable injection (runtime what-if)
RAG retrieves simulation records to answer questions
GraphRAG: Injecting Domain Knowledge
MiroFish uses GraphRAG instead of standard RAG — and the reason is intuitive:
# Standard RAG: document → vector → retrieve similar chunks
# Problem: can only answer "what are the facts," can't reason about relationships
# GraphRAG: document → entity extraction → relationship graph → graph traversal
# Advantage: can reason "A affects B, what does B do to C?"
In simulation, agents need to understand complex causal chains (e.g., "Policy X influences Group Y's behavior"). GraphRAG's graph structure handles this kind of relational reasoning far better than vector retrieval alone.
Zep Cloud: Agents That Remember
Every agent gets its own Zep Cloud memory space:
# Each agent has persistent memory
agent_memory = ZepMemory(
session_id=f"agent_{agent_id}",
zep_client=zep_client
)
# Between simulation rounds, agents can "remember" prior interactions
# This makes agent behavior coherent and more true to real human cognition
This solves a classic problem in multi-agent simulation: if agents "forget" everything between rounds, their behavior lacks continuity and prediction credibility drops significantly.
"God-Mode" Variable Injection
One of MiroFish's most distinctive features — injecting external variables at runtime:
# Example: "What if a competitor suddenly cuts prices by 20%?"
god_mode_injection = {
"event": "competitor_price_cut",
"magnitude": -0.20,
"timing": "round_15",
"affected_agents": "all_consumer_agents"
}
# After injection, the simulation responds in real-time to this "external shock"
# Output: Group sentiment shift + behavioral evolution trajectory
This transforms MiroFish from a "what will happen" tool into a "what happens if I do this" decision support tool.
Dual-Platform Parallel Simulation
MiroFish runs the same simulation on two independent virtual platforms simultaneously:
Platform A ─── Results A ──┐
├── Confidence scoring + combined report
Platform B ─── Results B ──┘
- Both platforms converge → high-confidence conclusion
- Platforms diverge → flagged as "uncertainty zone," users cautioned
BettaFish + MiroFish: The Complete Pipeline
[BettaFish] [MiroFish]
Weibo / Twitter / Reddit / ... → Seed data → Knowledge graph
Sentiment data collection → Agent initialization
30+ platforms → Simulation → Report → Prediction
Together, the two projects form a full chain: from real-world data collection to future trend forecasting.
Quick Start
Requirements: Python 3.10+, Node.js 16+, Docker (recommended)
# Clone
git clone https://github.com/666ghj/MiroFish.git
cd MiroFish
# Configure environment
cp .env.example .env
# Edit .env with:
# - OPENAI_API_KEY (or compatible API)
# - ZEP_API_KEY (Zep Cloud account)
# - GRAPHRAG_API_KEY
# Option 1: Docker one-command start (recommended)
docker-compose up -d
# Option 2: Manual startup
pip install -r requirements.txt
cd frontend && npm install && npm run build && cd ..
python app.py
Visit http://localhost:5000 to access the web UI.
Your first simulation:
1. Upload seed documents (PDF or enter URLs)
2. Configure simulation parameters (agent count, rounds, topic)
3. Click "Start Simulation"
4. Wait ~10-30 minutes (depends on agent count and API speed)
5. Review the generated trend report; ask follow-up questions in natural language
Deep Dive
System Architecture
┌──────────────────────────────────────────────────────┐
│ Frontend Layer (Vue.js) │
│ Config / Progress Monitor / Report Viewer / Chat │
└──────────────────────┬───────────────────────────────┘
│ REST API
┌──────────────────────▼───────────────────────────────┐
│ Backend Layer (Flask + Python) │
│ Five-stage pipeline orchestration / God-mode ctrl │
└──────┬───────────────┬─────────────────┬─────────────┘
│ │ │
┌──────▼──────┐ ┌─────▼──────┐ ┌──────▼─────────────┐
│ Knowledge │ │ Simulation │ │ Memory Layer │
│ Layer │ │ Layer │ │ Zep Cloud │
│ GraphRAG │ │ CAMEL-AI │ │ Per-agent memory │
│ Knowledge │ │ OASIS │ │ Cross-round │
│ graph │ │ Dual-plat │ │ persistence │
│ PyMuPDF │ │ parallel │ └────────────────────┘
└─────────────┘ └────────────┘
CAMEL-AI OASIS: The Simulation Engine
CAMEL-AI OASIS is MiroFish's simulation core, purpose-built for social simulation:
from oasis import Environment, Agent, Platform
# Initialize virtual platform
platform = Platform(
name="simulated_weibo",
max_agents=5000,
interaction_rules=InteractionConfig(
max_posts_per_round=10,
follow_probability=0.3
)
)
# Create agents with varied personas
agents = [
Agent(
id=i,
persona=PersonaProfile(
age=random.randint(18, 65),
occupation=random.choice(occupations),
political_lean=random.gauss(0, 1),
activity_level=random.uniform(0.1, 1.0)
),
memory=ZepMemory(session_id=f"agent_{i}"),
knowledge_context=graphrag_context
)
for i in range(5000)
]
# Launch simulation
env = Environment(platform=platform, agents=agents)
results = env.run(rounds=50, topic="EV subsidy policy")
GraphRAG Knowledge Graph Construction
from graphrag import GraphRAGPipeline
# Build knowledge graph from seed documents
pipeline = GraphRAGPipeline(
input_dir="./seed_documents",
output_dir="./knowledge_graph"
)
pipeline.run()
# Example output:
# Entities: [Tesla, BYD, Subsidy Policy, Consumer Groups, ...]
# Relations: [Tesla → competes → BYD]
# [Subsidy Policy → stimulates → consumer purchase intent]
# [Purchase intent → drives → market share]
# Inject knowledge graph into agent context
context = pipeline.query("Key factors in the EV market")
Report Generation: From Trajectories to Insights
def generate_report(simulation_results):
"""Generate trend report from simulation trajectories"""
# 1. Aggregate statistics
sentiment_evolution = aggregate_sentiment(simulation_results)
opinion_clusters = cluster_opinions(simulation_results)
key_events = detect_tipping_points(simulation_results)
# 2. LLM synthesis
report_prompt = f"""
Based on the following simulation data, generate a trend analysis report:
- Sentiment evolution curve: {sentiment_evolution}
- Opinion clusters: {opinion_clusters}
- Key tipping points: {key_events}
Analyze: collective final stance, key driving factors, likely evolution paths
"""
report = llm.complete(report_prompt)
# 3. Dual-platform consistency check
confidence = calculate_confidence(
results_platform_a=simulation_results["platform_a"],
results_platform_b=simulation_results["platform_b"]
)
return Report(content=report, confidence=confidence)
Frontend: Vue.js Visualization
The Vue.js frontend provides a full visualization suite:
- Simulation progress monitor: Real-time stage progress and agent activity levels
- Knowledge graph explorer: Interactive entity-relationship graph from GraphRAG
- Sentiment heatmap: Group emotional distribution evolving across simulation rounds
- Report reader: Highlighted, annotatable report with natural language follow-up chat
Why AGPL-3.0?
AGPL-3.0 is stricter than GPL in a specific way:
MIT/Apache: Modifications can stay closed source (can commercialize without sharing)
GPL: Distributions of modified code must be open-sourced
AGPL: Even network services (SaaS) using modified code must open-source modifications
→ MiroFish with AGPL means: if you build a cloud service on it, you must open-source
your changes. Personal research and local use have no restrictions.
Resources
Official
- 🌟 GitHub: https://github.com/666ghj/MiroFish
- 🐟 Sister project: BettaFish (sentiment collection, 40.5k ⭐)
- 📄 OASIS framework: https://github.com/camel-ai/oasis
- 🧠 Zep Cloud: https://www.getzep.com
- 📊 GraphRAG: https://github.com/microsoft/graphrag
Related Technologies
- CAMEL-AI: https://github.com/camel-ai/camel — multi-agent framework foundation
- PyMuPDF: PDF document parsing
- Docker: Recommended deployment method
Summary
Key Takeaways
- Swarm simulation replaces statistical fitting: MiroFish doesn't fit historical data — it re-enacts group interaction. The output is a behavioral evolution story, not a point prediction
- GraphRAG knowledge enhancement: Graph-structured knowledge representation gives agents domain-level causal reasoning, not just fact retrieval
- Zep Cloud cross-round memory: Solves simulation continuity, making agent behavior authentically follow human cognitive patterns
- God-mode what-if analysis: Runtime variable injection turns a prediction tool into a decision support tool
- Dual-platform confidence mechanism: Consistency-based confidence scoring is more honest about uncertainty than a single prediction
- BettaFish + MiroFish closed loop: Complete pipeline from real-world data collection to future trend forecasting
Who Should Use This
- Researchers: Scholars in public opinion dynamics, social simulation, and computational social science
- Product managers / market analysts: Decision-makers who need to anticipate market reactions and run scenario analysis
- AI engineers: Developers studying multi-agent simulation architectures, GraphRAG applications, and agent memory systems
- Independent researchers: Explorers interested in swarm intelligence emergence and using AI to understand complex social systems
A Question Worth Sitting With
MiroFish represents a new paradigm for AI prediction: from "model fitting" to "world simulation". When we re-enact the real world with multi-agent systems, we're asking a deeper question — reality is fundamentally the emergent result of countless agents in dynamic equilibrium. Perhaps truly understanding that emergence requires simulation of comparable complexity.
A school of fish has no central director, yet forms spectacular collective patterns. Is the trajectory of public opinion shaped by the same invisible hand?
Visit my personal site for more useful knowledge and interesting products



