2026 · 05 · 22 · Fri

Updates for 5/22

Today's update centers on AI agent security design — two resources that were missing from the practitioner's toolkit just landed at once. OWASP's Agentic Top 10 gives teams a shared vocabulary for design reviews, while Microsoft's RAMPART brings probabilistic safety checks into CI/CD pipelines. If you're shipping agents to production, both are worth an hour of your time this week.

A · Theme of the day

Agent safety design gets a standard checklist and automated testing

This week, the self-built approach to AI agent privilege design picks up two reference points: OWASP's official risk taxonomy and Microsoft's CI-native red-teaming framework.

OWASP publishes the first formal risk checklist built for AI agents

AI Agent Privilege Design
What changed

Designing the three pillars ad hoc tends to leave gaps. In December 2025, OWASP published the "Top 10 for Agentic Applications" (ASI01-ASI10), the first formal risk taxonomy dedicated to autonomous AI agents. Its ten items — goal hijacking, tool misuse, identity and privilege abuse, supply-chain compromise, unexpected code execution, memory poisoning, insecure inter-agent communication, cascading failures, human-agent trust exploitation, and rogue agents — systematize threats specific to agents that plan, use tools, and act autonomously, not chatbots or copilots. In this article, least privilege maps to "identity and privilege abuse," sandbox to "tool misuse / unexpected code execution," and human approval to "human-agent trust exploitation," so the list works well as a design-review checklist.

Compared to before

Until recently, AI agent security design relied on applying the three pillars — least privilege, sandbox, and human approval — in ad hoc ways. As MCP spread through 2025 and the tool surface exploded, there was still no official, shared vocabulary for saying "we've covered threat X." Security reviews became each team's own custom checklist written from scratch. OWASP published the Agentic Top 10 (ASI01-ASI10) in December 2025, and this week's article maps those items directly to the three-pillar framework.

Why it matters

For engineering teams running agents in production, and for managers steering enterprise AI adoption, this gives a shared vocabulary for design reviews. The fastest move is to map your current design against the 10 items and see which ones haven't been addressed. For teams still in PoC mode — not shipping agents yet — this can wait a bit. But if you're planning to go live, knowing what "complete" looks like before you push is worth an hour now.

RAMPART puts AI agent safety checks directly into your CI pipeline

AI Agent Privilege Design
What changed

Design is not the end — probabilistic agents also need continuous verification. In May 2026, Microsoft open-sourced RAMPART, a pytest framework that embeds agentic-AI red teaming into CI/CD pipelines, and Clarity, an advisory agent that surfaces requirements and failure modes before implementation begins. RAMPART can verify statistical policies such as "this action must be safe in at least 80% of runs," automatically checking that the privilege design still holds after implementation.

Compared to before

Over the past six months, agents started reaching production at a real scale. But verifying that privilege boundaries don't silently drift after each deploy was still largely manual — ad hoc red teaming, if you were lucky. Standard CI/CD pipelines had no native way to express "this agent must not do X in more than Y% of runs." Microsoft's RAMPART, open-sourced this month, fills that gap with pytest-style test definitions built for the probabilistic nature of LLM-backed agents.

Why it matters

Engineers shipping agents on CI/CD now have a way to catch privilege creep before it reaches production. Writing "this action must be safe in 80% of runs" as a test is fundamentally different from unit tests — it handles the non-determinism of LLMs directly. Clarity, the companion advisory agent, is useful for requirements gathering before you write the first line, but it's new enough that production adoption can wait. For solo projects or small teams, RAMPART may feel heavyweight to set up initially.

Archive

Past updates

A daily archive of changes actually applied to the site.