広告

COvolve: Adversarial Co-Evolution of Large-Language-Model-Generated Policies and Environments via Two-Player Zero-Sum Game

arXiv cs.AI / 2026/3/31

📰 ニュースSignals & Early TrendsIdeas & Deep AnalysisModels & Research

要点

  • COvolveは、LLMにより「環境(環境コード)」と「エージェント方策(方策コード)」の双方を自動生成し、静的/手作業の学習環境という課題を解決しようとする共進化フレームワークです。
  • 環境設計者と方策設計者の相互作用を二者ゼロ和ゲームとして定式化し、環境が方策の弱点を突くように生成され、方策がそれに適応するように共進化が進みます。
  • 共進化により環境と方策が協調して複雑さを増していく自動カリキュラムを誘導し、事前にタスク分布を定義しないオープンエンド学習を目指します。
  • ロバスト性と忘却の抑制のため、ゼロ和ゲームの混合戦略ナッシュ均衡(MSNE)を計算し、複数環境に対するメタポリシーを得て、既知環境の解法を保持しつつ未知環境にも対応します。
  • 都市運転、記号的迷路、幾何学ナビゲーションの実験で、LLM駆動の共進化が段階的により複雑な環境を生成し得ることを示しています。

Abstract

A central challenge in building continually improving agents is that training environments are typically static or manually constructed. This restricts continual learning and generalization beyond the training distribution. We address this with COvolve, a co-evolutionary framework that leverages large language models (LLMs) to generate both environments and agent policies, expressed as executable Python code. We model the interaction between environment and policy designers as a two-player zero-sum game, ensuring adversarial co-evolution in which environments expose policy weaknesses and policies adapt in response. This process induces an automated curriculum in which environments and policies co-evolve toward increasing complexity. To guarantee robustness and prevent forgetting as the curriculum progresses, we compute the mixed-strategy Nash equilibrium (MSNE) of the zero-sum game, thereby yielding a meta-policy. This MSNE meta-policy ensures that the agent does not forget to solve previously seen environments while learning to solve previously unseen ones. Experiments in urban driving, symbolic maze-solving, and geometric navigation showcase that COvolve produces progressively more complex environments. Our results demonstrate the potential of LLM-driven co-evolution to achieve open-ended learning without predefined task distributions or manual intervention.

広告