STAR-Teaming: A Strategy-Response Multiplex Network Approach to Automated LLM Red Teaming

arXiv cs.CL / 4/22/2026

📰 NewsModels & Research

Key Points

  • The paper proposes STAR-Teaming, a black-box, automated framework for LLM red teaming that generates jailbreak-style prompts without requiring access to model internals.
  • STAR-Teaming combines a multi-agent system with a strategy–response multiplex network, using network-driven optimization to sample effective attack strategies.
  • By replacing an intractable high-dimensional embedding space with a more tractable network structure, the method improves interpretability of an LLM’s strategic vulnerabilities.
  • It also reduces redundant search by organizing strategies into semantic communities, leading to faster exploration.
  • Experiments report substantially higher attack success rates at lower computational cost than prior approaches, and the authors release accompanying code for replication.

Abstract

While Large Language Models (LLMs) are widely used, they remain susceptible to jailbreak prompts that can elicit harmful or inappropriate responses. This paper introduces STAR-Teaming, a novel black-box framework for automated red teaming that effectively generates such prompts. STAR-Teaming integrates a Multi-Agent System (MAS) with a Strategy-Response Multiplex Network and employs network-driven optimization to sample effective attack strategies. This network-based approach recasts the intractable high-dimensional embedding space into a tractable structure, yielding two key advantages: it enhances the interpretability of the LLM's strategic vulnerabilities, and it streamlines the search for effective strategies by organizing the search space into semantic communities, thereby preventing redundant exploration. Empirical results demonstrate that STAR-Teaming significantly surpasses existing methods, achieving a higher attack success rate (ASR) at a lower computational cost. Extensive experiments validate the effectiveness and explainability of the Multiplex Network. The code is available at https://github.com/selectstar-ai/STAR-Teaming-paper.