AI Navigate

State of MCP Security 2026: We Scanned 15,923 AI Tools. Here's What We Found.

Dev.to / 3/23/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisTools & Practical Usage

Key Points

  • The analysis covers 15,923 AI tools across MCP servers and OpenClaw/ClawHub skills, making it the largest independent security study of the MCP/AI tool ecosystem to date.
  • The results show poor quality: no tool scored A; only 2% scored B or higher; 36% of MCP servers failed (F), with a bimodal distribution of C or F.
  • Token leakage is the top vulnerability, affecting 13% of servers (757) and driving 32,691 findings across the ecosystem.
  • Other notable vulnerabilities include command injection, SQL injection, path traversal, prototype pollution, hardcoded credentials, and secret leakage, indicating widespread security gaps and hygiene issues.

We scanned every publicly available MCP server and OpenClaw skill — 15,923 in total. Here's the complete security landscape of the AI tool ecosystem.

TL;DR: 36% of MCP servers scored F (failing). 42 skills confirmed malicious (0.4%), with 552 initially flagged. Token leakage is the #1 vulnerability, found in 757 servers. Only 2% earned a B grade or higher.

The Dataset

SpiderRating analyzed 15,923 AI tools across two ecosystems:

  • 5,725 MCP servers (Model Context Protocol — the standard for connecting AI agents to external tools)
  • 10,198 OpenClaw/ClawHub skills (agent behavior definitions for Claude, Cursor, Windsurf)

Each tool was rated on three dimensions: Description Quality, Security, and Metadata — combined into a SpiderScore (0-10) and letter grade (A-F).

This is the largest independent security analysis of the MCP/AI tool ecosystem to date.

Key Findings

1. Most AI Tools Are Mediocre — Only 2% Score B or Higher

Grade MCP Servers Skills What It Means
A (9.0+) 0 (0%) 0 (0%) No tool meets "exemplary" standards
B (7.0-8.9) 116 (2%) 95 (1%) Production-ready with good practices
C (5.0-6.9) 1,995 (35%) 9,050 (89%) Adequate but room for improvement
D (3.0-4.9) 1,546 (27%) 1,052 (10%) Significant quality/security gaps
F (<3.0) 2,068 (36%) 1 (0%) Failing — serious issues

Zero tools scored A. MCP servers have a bimodal distribution: either decent (C) or terrible (F).

2. Token Leakage Is the #1 Vulnerability

We found 32,691 security findings across the ecosystem.

Rank Vulnerability Servers Affected Findings
1 Token Leakage 757 (13%) 6,632
2 Command Injection 269 (5%) 1,007
3 SQL Injection 105 (2%) 787
4 Path Traversal 244 (4%) 761
5 Prototype Pollution 145 (3%) 489
6 Hardcoded Credentials 163 (3%) 389
7 Secret Leakage (metadata) 114 (2%) 376
8 Command Injection (os) 112 (2%) 263

Token leakage alone accounts for 20% of all findings. API keys, auth tokens, and secrets are being exposed through MCP tool outputs.

3. 36% of MCP Servers Score F

More than a third of MCP servers are fundamentally unsafe:

  • Average MCP score: 4.11/10
  • Average skill score: 5.91/10

Why MCP servers score worse: Description quality crisis — average 3.13/10. Most servers don't tell AI agents what their tools do.

4. 552 Skills Flagged, 42 Confirmed Malicious

We used a two-pass security analysis:

  1. Automated Threat Scanner — pattern matching for known malicious behaviors
  2. LLM Verification — Claude Haiku reviews each finding to distinguish "security tool describing attacks" from "malicious skill executing attacks"

Results:

  • 552 skills initially flagged with critical security issues
  • 42 confirmed malicious after LLM verification (0.4% of ecosystem)
  • 97% of automated findings were false positives — mostly legitimate security tools whose descriptions triggered keyword-based detection

5. The Description Quality Crisis

97% of tools lack a scenario trigger — they don't tell the AI when to use them.

Signal Coverage
Has action verb ~60%
Has scenario trigger ~3%
Has param documentation ~45%
Has error guidance ~8%

AI agents frequently choose the wrong tool — not because AI is dumb, but because tool documentation is broken.

What This Means for Developers

If you build MCP servers:

  1. Write scenario triggers — tell AI agents when to use each tool
  2. Don't log tokens — use structured error handling that strips secrets
  3. Use parameterized queries — SQL injection is #3
  4. Add a README and license — it's 20% of your score

If you install AI tools:

  1. Check the SpiderScore before installing — below C (5.0) has known issues
  2. Be cautious with skills rated critical — 0.4% are confirmed malicious
  3. Prefer tools with B grade — they've demonstrated security best practices

Methodology

  • Scanner: spidershield (open source, MIT)
  • Data: 15,923 tools, 78,849 tool descriptions, 32,691 security findings
  • Precision: 93.6% calibrated accuracy
  • Scoring: Description (45%) + Security (35%) + Metadata (20%)

Data updated daily. Full methodology available at spiderrating.com.

What's the worst MCP security issue you've encountered? Share in the comments.