T-MAP: Red-Teaming LLM Agents with Trajectory-aware Evolutionary Search

arXiv cs.AI / 3/25/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that traditional LLM red-teaming methods miss agent-specific vulnerabilities that only appear during multi-step tool use, especially in tool ecosystems like Model Context Protocol (MCP).
  • It introduces T-MAP, a trajectory-aware evolutionary search technique that uses execution trajectories to systematically generate adversarial prompts and attack paths.
  • T-MAP can automatically produce attacks that bypass safety guardrails while still achieving harmful objectives through real tool interactions, not just harmful text.
  • Experiments across multiple MCP environments show T-MAP significantly improves attack realization rate (ARR) versus baselines and remains effective against multiple frontier models, including GPT-5.2, Gemini-3-Pro, Qwen3.5, and GLM-5.
  • The findings suggest autonomous LLM agents have underexplored security weaknesses tied to tool-execution trajectories and agent behavior over time.

Abstract

While prior red-teaming efforts have focused on eliciting harmful text outputs from large language models (LLMs), such approaches fail to capture agent-specific vulnerabilities that emerge through multi-step tool execution, particularly in rapidly growing ecosystems such as the Model Context Protocol (MCP). To address this gap, we propose a trajectory-aware evolutionary search method, T-MAP, which leverages execution trajectories to guide the discovery of adversarial prompts. Our approach enables the automatic generation of attacks that not only bypass safety guardrails but also reliably realize harmful objectives through actual tool interactions. Empirical evaluations across diverse MCP environments demonstrate that T-MAP substantially outperforms baselines in attack realization rate (ARR) and remains effective against frontier models, including GPT-5.2, Gemini-3-Pro, Qwen3.5, and GLM-5, thereby revealing previously underexplored vulnerabilities in autonomous LLM agents.