We scanned every publicly available MCP server and OpenClaw skill — 15,923 in total. Here's the complete security landscape of the AI tool ecosystem.
TL;DR: 36% of MCP servers scored F (failing). 42 skills confirmed malicious (0.4%), with 552 initially flagged. Token leakage is the #1 vulnerability, found in 757 servers. Only 2% earned a B grade or higher.
The Dataset
SpiderRating analyzed 15,923 AI tools across two ecosystems:
- 5,725 MCP servers (Model Context Protocol — the standard for connecting AI agents to external tools)
- 10,198 OpenClaw/ClawHub skills (agent behavior definitions for Claude, Cursor, Windsurf)
Each tool was rated on three dimensions: Description Quality, Security, and Metadata — combined into a SpiderScore (0-10) and letter grade (A-F).
This is the largest independent security analysis of the MCP/AI tool ecosystem to date.
Key Findings
1. Most AI Tools Are Mediocre — Only 2% Score B or Higher
| Grade | MCP Servers | Skills | What It Means |
|---|---|---|---|
| A (9.0+) | 0 (0%) | 0 (0%) | No tool meets "exemplary" standards |
| B (7.0-8.9) | 116 (2%) | 95 (1%) | Production-ready with good practices |
| C (5.0-6.9) | 1,995 (35%) | 9,050 (89%) | Adequate but room for improvement |
| D (3.0-4.9) | 1,546 (27%) | 1,052 (10%) | Significant quality/security gaps |
| F (<3.0) | 2,068 (36%) | 1 (0%) | Failing — serious issues |
Zero tools scored A. MCP servers have a bimodal distribution: either decent (C) or terrible (F).
2. Token Leakage Is the #1 Vulnerability
We found 32,691 security findings across the ecosystem.
| Rank | Vulnerability | Servers Affected | Findings |
|---|---|---|---|
| 1 | Token Leakage | 757 (13%) | 6,632 |
| 2 | Command Injection | 269 (5%) | 1,007 |
| 3 | SQL Injection | 105 (2%) | 787 |
| 4 | Path Traversal | 244 (4%) | 761 |
| 5 | Prototype Pollution | 145 (3%) | 489 |
| 6 | Hardcoded Credentials | 163 (3%) | 389 |
| 7 | Secret Leakage (metadata) | 114 (2%) | 376 |
| 8 | Command Injection (os) | 112 (2%) | 263 |
Token leakage alone accounts for 20% of all findings. API keys, auth tokens, and secrets are being exposed through MCP tool outputs.
3. 36% of MCP Servers Score F
More than a third of MCP servers are fundamentally unsafe:
- Average MCP score: 4.11/10
- Average skill score: 5.91/10
Why MCP servers score worse: Description quality crisis — average 3.13/10. Most servers don't tell AI agents what their tools do.
4. 552 Skills Flagged, 42 Confirmed Malicious
We used a two-pass security analysis:
- Automated Threat Scanner — pattern matching for known malicious behaviors
- LLM Verification — Claude Haiku reviews each finding to distinguish "security tool describing attacks" from "malicious skill executing attacks"
Results:
- 552 skills initially flagged with critical security issues
- 42 confirmed malicious after LLM verification (0.4% of ecosystem)
- 97% of automated findings were false positives — mostly legitimate security tools whose descriptions triggered keyword-based detection
5. The Description Quality Crisis
97% of tools lack a scenario trigger — they don't tell the AI when to use them.
| Signal | Coverage |
|---|---|
| Has action verb | ~60% |
| Has scenario trigger | ~3% |
| Has param documentation | ~45% |
| Has error guidance | ~8% |
AI agents frequently choose the wrong tool — not because AI is dumb, but because tool documentation is broken.
What This Means for Developers
If you build MCP servers:
- Write scenario triggers — tell AI agents when to use each tool
- Don't log tokens — use structured error handling that strips secrets
- Use parameterized queries — SQL injection is #3
- Add a README and license — it's 20% of your score
If you install AI tools:
- Check the SpiderScore before installing — below C (5.0) has known issues
- Be cautious with skills rated critical — 0.4% are confirmed malicious
- Prefer tools with B grade — they've demonstrated security best practices
Methodology
- Scanner: spidershield (open source, MIT)
- Data: 15,923 tools, 78,849 tool descriptions, 32,691 security findings
- Precision: 93.6% calibrated accuracy
- Scoring: Description (45%) + Security (35%) + Metadata (20%)
Data updated daily. Full methodology available at spiderrating.com.
What's the worst MCP security issue you've encountered? Share in the comments.



