Adversarial Arena: Crowdsourcing Data Generation through Interactive Competition

arXiv cs.AI / 4/21/2026

📰 NewsIndustry & Market MovesModels & Research

Key Points

  • The paper proposes Adversarial Arena, a new way to generate high-quality multi-turn conversational data for post-training large language models by turning dataset creation into an interactive adversarial game.
  • In the setup, multiple teams act as attackers (creating prompts) and defenders (generating responses), which helps produce data that is more diverse and complex than typical crowdsourcing or purely synthetic methods.
  • The authors ran a competition with 10 top US and European academic teams, yielding 19,683 multi-turn conversations focused on LLM safety alignment in cybersecurity.
  • Fine-tuning an open-source model on the resulting dataset led to measurable gains in secure code generation, improving scores by 18.47% on CyberSecEval-Instruct and 29.42% on CyberSecEval-MITRE.

Abstract

Post-training Large Language Models requires diverse, high-quality data which is rare and costly to obtain, especially in low resource domains and for multi-turn conversations. Common solutions are crowdsourcing or synthetic generation, but both often yield low-quality or low-diversity data. We introduce Adversarial Arena for building high quality conversational datasets by framing data generation as an adversarial task: attackers create prompts, and defenders generate responses. This interactive competition between multiple teams naturally produces diverse and complex data. We validated this approach by conducting a competition with 10 academic teams from top US and European universities, each building attacker or defender bots. The competition, focused on safety alignment of LLMs in cybersecurity, generated 19,683 multi-turn conversations. Fine-tuning an open-source model on this dataset produced an 18.47% improvement in secure code generation on CyberSecEval-Instruct and 29.42% improvement on CyberSecEval-MITRE.