State of MCP Security 2026: We Scanned 15,923 AI Tools. Here's What We Found.

Dev.to / 3/23/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisTools & Practical Usage

共有:

Key Points

The analysis covers 15,923 AI tools across MCP servers and OpenClaw/ClawHub skills, making it the largest independent security study of the MCP/AI tool ecosystem to date.
The results show poor quality: no tool scored A; only 2% scored B or higher; 36% of MCP servers failed (F), with a bimodal distribution of C or F.
Token leakage is the top vulnerability, affecting 13% of servers (757) and driving 32,691 findings across the ecosystem.
Other notable vulnerabilities include command injection, SQL injection, path traversal, prototype pollution, hardcoded credentials, and secret leakage, indicating widespread security gaps and hygiene issues.

We scanned every publicly available MCP server and OpenClaw skill — 15,923 in total. Here's the complete security landscape of the AI tool ecosystem.

TL;DR: 36% of MCP servers scored F (failing). 42 skills confirmed malicious (0.4%), with 552 initially flagged. Token leakage is the #1 vulnerability, found in 757 servers. Only 2% earned a B grade or higher.

The Dataset

SpiderRating analyzed 15,923 AI tools across two ecosystems:

5,725 MCP servers (Model Context Protocol — the standard for connecting AI agents to external tools)
10,198 OpenClaw/ClawHub skills (agent behavior definitions for Claude, Cursor, Windsurf)

Each tool was rated on three dimensions: Description Quality, Security, and Metadata — combined into a SpiderScore (0-10) and letter grade (A-F).

This is the largest independent security analysis of the MCP/AI tool ecosystem to date.

Key Findings

1. Most AI Tools Are Mediocre — Only 2% Score B or Higher

Grade	MCP Servers	Skills	What It Means
A (9.0+)	0 (0%)	0 (0%)	No tool meets "exemplary" standards
B (7.0-8.9)	116 (2%)	95 (1%)	Production-ready with good practices
C (5.0-6.9)	1,995 (35%)	9,050 (89%)	Adequate but room for improvement
D (3.0-4.9)	1,546 (27%)	1,052 (10%)	Significant quality/security gaps
F (<3.0)	2,068 (36%)	1 (0%)	Failing — serious issues

Zero tools scored A. MCP servers have a bimodal distribution: either decent (C) or terrible (F).

2. Token Leakage Is the #1 Vulnerability

We found 32,691 security findings across the ecosystem.

Rank	Vulnerability	Servers Affected	Findings
1	Token Leakage	757 (13%)	6,632
2	Command Injection	269 (5%)	1,007
3	SQL Injection	105 (2%)	787
4	Path Traversal	244 (4%)	761
5	Prototype Pollution	145 (3%)	489
6	Hardcoded Credentials	163 (3%)	389
7	Secret Leakage (metadata)	114 (2%)	376
8	Command Injection (os)	112 (2%)	263

Token leakage alone accounts for 20% of all findings. API keys, auth tokens, and secrets are being exposed through MCP tool outputs.

3. 36% of MCP Servers Score F

More than a third of MCP servers are fundamentally unsafe:

Average MCP score: 4.11/10
Average skill score: 5.91/10

Why MCP servers score worse: Description quality crisis — average 3.13/10. Most servers don't tell AI agents what their tools do.

4. 552 Skills Flagged, 42 Confirmed Malicious

We used a two-pass security analysis:

Automated Threat Scanner — pattern matching for known malicious behaviors
LLM Verification — Claude Haiku reviews each finding to distinguish "security tool describing attacks" from "malicious skill executing attacks"

Results:

552 skills initially flagged with critical security issues
42 confirmed malicious after LLM verification (0.4% of ecosystem)
97% of automated findings were false positives — mostly legitimate security tools whose descriptions triggered keyword-based detection

5. The Description Quality Crisis

97% of tools lack a scenario trigger — they don't tell the AI when to use them.

Signal	Coverage
Has action verb	~60%
Has scenario trigger	~3%
Has param documentation	~45%
Has error guidance	~8%

AI agents frequently choose the wrong tool — not because AI is dumb, but because tool documentation is broken.

What This Means for Developers

If you build MCP servers:

Write scenario triggers — tell AI agents when to use each tool
Don't log tokens — use structured error handling that strips secrets
Use parameterized queries — SQL injection is #3
Add a README and license — it's 20% of your score

If you install AI tools:

Check the SpiderScore before installing — below C (5.0) has known issues
Be cautious with skills rated critical — 0.4% are confirmed malicious
Prefer tools with B grade — they've demonstrated security best practices

Methodology

Scanner: spidershield (open source, MIT)
Data: 15,923 tools, 78,849 tool descriptions, 32,691 security findings
Precision: 93.6% calibrated accuracy
Scoring: Description (45%) + Security (35%) + Metadata (20%)

Data updated daily. Full methodology available at spiderrating.com.

What's the worst MCP security issue you've encountered? Share in the comments.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 3/23DailyView insight →

Is AI becoming a bubble, and could it end like the dot-com crash?

Reddit r/artificial

The Beginner's Guide to Crypto Paper Trading with AI in 2026

Dev.to

Externalizing State

Dev.to

I made a 'benchmark' where LLMs write code controlling units in a 1v1 RTS game.

Dev.to

My AI Does Not Have a Clock

Dev.to

State of MCP Security 2026: We Scanned 15,923 AI Tools. Here's What We Found.

Key Points

The Dataset

Key Findings

1. Most AI Tools Are Mediocre — Only 2% Score B or Higher

2. Token Leakage Is the #1 Vulnerability

3. 36% of MCP Servers Score F

4. 552 Skills Flagged, 42 Confirmed Malicious

5. The Description Quality Crisis

What This Means for Developers

Methodology

💡 Insights using this article

Related Articles

Is AI becoming a bubble, and could it end like the dot-com crash?

The Beginner's Guide to Crypto Paper Trading with AI in 2026

Externalizing State

I made a 'benchmark' where LLMs write code controlling units in a 1v1 RTS game.

My AI Does Not Have a Clock

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer