|
[link] [comments] |
civStation - a VLM system for playing Civilization VI via strategy-level natural language
Reddit r/LocalLLaMA / 3/31/2026
💬 OpinionSignals & Early TrendsTools & Practical UsageModels & Research
Key Points
- civStation is an experimental “computer-use” VLM system that plays Civilization VI by translating strategy-level natural-language instructions (e.g., “focus on economy” or “aim for a science victory”) into concrete in-game actions.
- The system uses a three-layer design—Strategy (intent/goal planning and decomposition), Action (VLM-based screen interpretation plus mouse/keyboard execution without a game API), and HITL (human-in-the-loop overrides for real-time control).
- Rather than relying on a single action sequence, it plans one strategy and then generates multiple possible action sequences per task, typically requiring about 2–16 model calls.
- Execution is implemented via sub-agents for bounded gameplay tasks (such as city management or unit control), and the project emphasizes shifting interaction from “action → intent” toward delegation and agent orchestration.
- Key challenges highlighted include VLM perception errors, execution drift across multi-step play, and limited verification reliability, alongside latency/API-cost trade-offs from multi-step calling and fallback behaviors.
- The project’s central goal is not only automated gameplay, but also improving the human–system interface by enabling strategy-level control in UI-only environments.



