COvolve: Adversarial Co-Evolution of Large-Language-Model-Generated Policies and Environments via Two-Player Zero-Sum Game

arXiv cs.AI / 3/31/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

COvolveは、LLMにより「環境（環境コード）」と「エージェント方策（方策コード）」の双方を自動生成し、静的/手作業の学習環境という課題を解決しようとする共進化フレームワークです。
環境設計者と方策設計者の相互作用を二者ゼロ和ゲームとして定式化し、環境が方策の弱点を突くように生成され、方策がそれに適応するように共進化が進みます。
共進化により環境と方策が協調して複雑さを増していく自動カリキュラムを誘導し、事前にタスク分布を定義しないオープンエンド学習を目指します。
ロバスト性と忘却の抑制のため、ゼロ和ゲームの混合戦略ナッシュ均衡（MSNE）を計算し、複数環境に対するメタポリシーを得て、既知環境の解法を保持しつつ未知環境にも対応します。
都市運転、記号的迷路、幾何学ナビゲーションの実験で、LLM駆動の共進化が段階的により複雑な環境を生成し得ることを示しています。

Abstract

A central challenge in building continually improving agents is that training environments are typically static or manually constructed. This restricts continual learning and generalization beyond the training distribution. We address this with COvolve, a co-evolutionary framework that leverages large language models (LLMs) to generate both environments and agent policies, expressed as executable Python code. We model the interaction between environment and policy designers as a two-player zero-sum game, ensuring adversarial co-evolution in which environments expose policy weaknesses and policies adapt in response. This process induces an automated curriculum in which environments and policies co-evolve toward increasing complexity. To guarantee robustness and prevent forgetting as the curriculum progresses, we compute the mixed-strategy Nash equilibrium (MSNE) of the zero-sum game, thereby yielding a meta-policy. This MSNE meta-policy ensures that the agent does not forget to solve previously seen environments while learning to solve previously unseen ones. Experiments in urban driving, symbolic maze-solving, and geometric navigation showcase that COvolve produces progressively more complex environments. Our results demonstrate the potential of LLM-driven co-evolution to achieve open-ended learning without predefined task distributions or manual intervention.

Black Hat Asia

AI Business

How to Verify Information Online and Avoid Fake Content

Dev.to

I built an AI code reviewer solo while working full-time — honest post-launch breakdown

Dev.to

Why Your State Management Is Slowing Down AI-Assisted Development

Dev.to

Google Stitch vs Claude: Which AI Design Tool Wins in 2026?

Dev.to

COvolve: Adversarial Co-Evolution of Large-Language-Model-Generated Policies and Environments via Two-Player Zero-Sum Game

Key Points

Abstract

Related Articles

Black Hat Asia

How to Verify Information Online and Avoid Fake Content

I built an AI code reviewer solo while working full-time — honest post-launch breakdown

Why Your State Management Is Slowing Down AI-Assisted Development

Google Stitch vs Claude: Which AI Design Tool Wins in 2026?

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer