A Claude Code hook that warns you before calling a low-trust MCP server

Dev.to / 4/20/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical UsageModels & Research

Key Points

  • Researchers reported security/design problems in MCP stdio transport that can enable unchecked arbitrary command execution, and found most tested MCP marketplaces to be “poisonable.”
  • Anthropic said protocol-level STDIO fixes are out of scope and argued that operational trust should be handled by the ecosystem, leaving a practical gap for users of Claude Code.
  • The article introduces a zero-config “Claude Code hook” that checks a server’s trust via a public trust API before each MCP tool call and warns Claude inline when trust scores are low.
  • After each call, the hook generates an Ed25519-signed “receipt” (using hashes rather than raw content) and sends it to a public aggregator to update the trust score based on real call outcomes.
  • The proposed workflow aims to answer the key question—whether an MCP server actually works safely in real-time—rather than relying only on metadata, popularity, or static repository scans.

Last week researchers at Ox published findings showing that the MCP STDIO transport lets arbitrary command execution slip through unchecked, and that 9 of 11 MCP marketplaces they tested were poisonable. Anthropic's response: STDIO is out of scope for protocol-level fixes, the ecosystem is responsible for operational trust.

Fair — Anthropic donated MCP to the Linux Foundation's Agentic AI Foundation in December 2025 specifically so independent infrastructure could grow around it. But that leaves a real gap for anyone running Claude Code today: how do you know whether an MCP server you're about to invoke is trustworthy?

The Anthropic official registry is pure metadata (license, commit count, popularity). mcp-scorecard.ai scores repos, not behavior. BlueRock runs OWASP-style static scans. None of these ask the one question that actually matters:

Does this MCP server, in real call-time use, work?

So I built a small thing to answer it.

The hook

A zero-config Claude Code hook that does two things on every MCP tool call:

  1. Before the call — queries a public trust API for that server. If the score is low, Claude shows an inline warning:
   ⚠ XAIP: "some-server" trust=0.32 (caution, 87 receipts) Risk: high_error_rate
  1. After the call — emits an Ed25519-signed receipt (success, latency, hashed input/output) to a public aggregator that updates the score.

Install:

npm install -g xaip-claude-hook
xaip-claude-hook install

Next MCP call fires the hook. That's the whole UX.

What a receipt looks like

No raw content leaves your machine — only hashes.

{
  "agentDid":      "did:web:context7",
  "callerDid":     "did:key:a1c6cd34…",
  "toolName":      "resolve-library-id",
  "taskHash":      "9f3e…",   // sha256(input).slice(0,16)
  "resultHash":    "1b78…",   // sha256(response).slice(0,16)
  "success":       true,
  "latencyMs":     668,
  "failureType":   "",
  "timestamp":     "2026-04-17T04:24:59.925Z",
  "signature":     "...",     // Ed25519 over canonical JSON (agent key)
  "callerSignature": "..."    // Ed25519 over canonical JSON (caller key)
}

The aggregator rejects anything that fails signature verification. The trust API computes a Bayesian score across all verified receipts per server, weighted by caller diversity — so one enthusiastic installer can't fake a reputation.

What the scores actually look like right now

Being transparent: the dataset is small. A curl against the live trust API today:

Server Trust Verdict Receipts Flag
memory 0.800 trusted 112
git 0.775 trusted 35
sqlite 0.753 trusted 42
puppeteer 0.671 caution 32 high_error_rate
context7 0.618 caution 560 low_caller_diversity
filesystem 0.579 caution 610 low_caller_diversity
playwright 0.394 low_trust 37 high_error_rate
fetch 0.365 low_trust 36 high_error_rate

Verify any of these yourself:

curl https://xaip-trust-api.kuma-github.workers.dev/v1/trust/context7

The low_caller_diversity flag on high-volume servers is the single most honest number in that table. It means: I'm the biggest caller right now, and that's exactly the problem this tool is supposed to solve. The flag only clears when independent installers start generating receipts — which is what the npm package is for.

Why this is architecturally different from existing approaches

Every other "MCP trust" project I've seen scores the repository:

  • Commit frequency, license, stars, contributor count (mcp-scorecard.ai)
  • Static source-code vulnerability scans (BlueRock)
  • Registry inclusion as implicit trust (official MCP registry)

These are useful proxies, but none of them tell you whether a server works in practice. A well-maintained repo can have a buggy release; a single-author repo can be rock solid; a newly-forked malicious repo looks identical to the original under static scan.

XAIP scores observed behavior. Every call is a signed attestation. The scoring is Bayesian, so:

  • Servers with few receipts get insufficient_data — no verdict, no warning
  • High-variance patterns (mixed success/failure) get lower confidence
  • The high_error_rate flag is computed from real response content, classifying quota exceeded, rate limit, unauthorized, and "isError": true as failures

This is the same philosophy as OpenSSF Scorecard vs. runtime attestation in supply chain: you want both, but only one of them catches regressions in production.

What's missing / where this could go wrong

I want to be specific about limitations, because "AI trust protocol" posts tend to overpromise:

  • ~10 servers, ~1500 receipts total. Small. This post is partly an ask for installers to fix that.
  • One aggregator node. Byzantine fault tolerance requires quorum; right now there's one Cloudflare Worker. Quorum needs multiple operators, which is the next milestone.
  • Client-side inferSuccess is heuristic. We look at response text for error patterns. False positives and negatives are possible — fetch's 36% error rate might be over-counted (legit 404s shouldn't hurt the server's score) or real.
  • Privacy model relies on hashes, not ZK. Inputs and outputs are hashed before transmission, but statistical correlation across taskHashes is possible in principle. Migration to ZK receipt aggregation is a future idea, not a current feature.
  • I personally generated most of the high-volume receipts. The low_caller_diversity flag you see on context7 and filesystem is me.

Running it yourself

npm install -g xaip-claude-hook
xaip-claude-hook install
xaip-claude-hook status

Open a new Claude Code session. Call any MCP tool. Check:

cat ~/.xaip/hook.log

You'll see lines like:

2026-04-17T04:24:59Z POST context7/resolve-library-id ok=true lat=668ms → 200

And the next time you (or Claude) invoke a low-trust server, the warning shows up inline.

Uninstall is a single command. Keys under ~/.xaip/ persist — delete manually to wipe.

Links

Issues, scoring bugs, angry takes — all welcome on GitHub. If you maintain an MCP server and your score looks wrong, I want to hear about it first.