Cooperation and Exploitation in LLM Policy Synthesis for Sequential Social Dilemmas
arXiv cs.CL / 3/23/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper explores using large language models to iteratively generate Python policy functions for agents in sequential social dilemmas and evaluate them via self-play with performance feedback.
- It compares sparse feedback (scalar reward) to dense feedback (reward plus social metrics: efficiency, equality, sustainability, peace) across two canonical dilemmas (Gathering and Cleanup) and two frontier LLMs (Claude Sonnet 4.6 and Gemini 3.1 Pro), with dense feedback often matching or exceeding sparse.
- Dense social metrics act as coordination signals that guide the LLM toward cooperative strategies such as territory partitioning, adaptive role assignment, and avoidance of wasteful aggression, without triggering over-optimization of fairness.
- The authors perform an adversarial experiment identifying five attack classes and discuss mitigations, highlighting a tension between expressiveness and safety in LLM policy synthesis.
- The work provides code at https://github.com/vicgalle/llm-policies-social-dilemmas, enabling replication and further study.
Related Articles
How CVE-2026-25253 exposed every OpenClaw user to RCE — and how to fix it in one command
Dev.to
Does Synthetic Data Generation of LLMs Help Clinical Text Mining?
Dev.to
What CVE-2026-25253 Taught Me About Building Safe AI Assistants
Dev.to
Day 52: Building vs Shipping — Why We Had 711 Commits and 0 Users
Dev.to
The Dawn of the Local AI Era: From iPhone 17 Pro to the Future of NVIDIA RTX
Dev.to