Odysseus: Scaling VLMs to 100+ Turn Decision-Making in Games via Reinforcement Learning
arXiv cs.AI / 5/4/2026
💬 OpinionDeveloper Stack & InfrastructureModels & Research
Key Points
- The paper explores how to train vision-language models (VLMs) with reinforcement learning for long-horizon, interactive decision-making in Super Mario Land, requiring 100+ turns.
- It analyzes key RL algorithm components and proposes a modified PPO approach using a lightweight turn-level critic to improve training stability and sample efficiency versus critic-free alternatives.
- Experiments show that starting from pretrained VLMs supplies strong action priors, which boosts sample efficiency and reduces the need for manual action engineering compared with training deep RL from scratch.
- The authors introduce Odysseus, an open training framework for VLM agents, reporting sizable in-game gains (including at least 3x average game progress over frontier models) and better cross-game generalization while preserving general-domain abilities.
Related Articles
AnnouncementsBuilding a new enterprise AI services company with Blackstone, Hellman & Friedman, and Goldman Sachs
Anthropic News

Dara Khosrowshahi on replacing Uber drivers — and himself — with AI
The Verge

CLMA Frame Test
Dev.to

Governance and Liability in AI Agents: What I Built Trying to Answer Those Questions
Dev.to

Roundtable chat with Talkie-1930 and Gemma 4 31B
Reddit r/LocalLLaMA