MCPHunt: An Evaluation Framework for Cross-Boundary Data Propagation in Multi-Server MCP Agents

arXiv cs.AI / 5/1/2026

📰 NewsDeveloper Stack & InfrastructureIndustry & Market MovesModels & Research

共有:

Key Points

The paper introduces MCPHunt, a controlled benchmark to measure how multi-server MCP agents can unintentionally propagate credentials across trust boundaries when tools are composed across a workflow topology.
MCPHunt uses canary-based taint tracking and an environment-controlled coverage design (including risky, benign, and hard-negative cases) to detect verbatim, non-adversarial credential propagation via objective string matching.
Results from 3,615 traces across 147 tasks and 5 models show policy-violating propagation occurs at a high rate (11.5–41.3%) and is concentrated in browser-mediated data flows, with large variation by pathway.
A prompt-mitigation study finds prompt-level defenses can reduce violations by up to 97% while preserving 80.5% utility, but effectiveness depends strongly on the model’s instruction-following ability.
The authors release code, traces, and the labeling pipeline (MIT and CC BY 4.0), enabling reproducible evaluation of cross-boundary data propagation risks in MCP agent systems.

Abstract

Multi-server MCP agents create an information-flow control problem: faithful tool composition can turn individually benign read/write permissions into cross-boundary credential propagation -- a structural side effect of workflow topology, not necessarily malicious model behavior. We present MCPHunt, to our knowledge the first controlled benchmark that isolates non-adversarial, verbatim credential propagation across multi-server MCP trust boundaries, with three methodological contributions: (1) canary-based taint tracking that reduces propagation detection to objective string matching; (2) an environment-controlled coverage design with risky, benign, and hard-negative conditions that validates pipeline soundness and controls for credential-format confounds; (3) CRS stratification that disentangles task-mandated propagation (faithful execution of verbatim-transfer instructions) from policy-violating propagation (credentials included despite the option to redact). Across 3,615 main-benchmark traces from 5 models spanning 147 tasks and 9 mechanism families, policy-violating propagation rates reach 11.5--41.3% across all models. This propagation is pathway-specific (25x cross-mechanism range) and concentrated in browser-mediated data flows; hard-negative controls provide evidence that production-format credentials are not necessary -- prompt-directed cross-boundary data flow is sufficient. A prompt-mitigation study across 3 models reduces policy-violating propagation by up to 97% while preserving 80.5% utility, but effectiveness varies with instruction-following capability -- suggesting that prompt-level defenses alone may not suffice. Code, traces, and labeling pipeline are released under MIT and CC BY 4.0.