Most of the current “AI security” stack seems focused on:
• prompts • identities • outputs After an agent deleted a prod database on me a year ago. I saw the gap and started building.
a control layer directly in the execution path between agents and tools. We are to market but I don’t want to spam yall with our company so I left it out.
⸻
What that actually means
Every time an agent tries to take an action (API call, DB read, file access, etc.), we intercept it and decide in real time:
• allow • block • require approval But the important part is how that decision is made.
⸻
A few things we’re doing differently
- Credential starvation (instead of trusting long-lived access)
Agents don’t get broad, persistent credentials.
They effectively operate with nothing by default, and access is granted per action based on policy + context.
⸻
- Session-based risk escalation (not stateless checks)
We track behavior across the entire session.
Example:
• one DB read → fine • 20 sequential reads + export → risk escalates • tool chaining → risk escalates So decisions aren’t per-call—they’re based on what the agent has been doing over time.
⸻
- HITL only when it actually matters
We don’t want humans in the loop for everything.
Instead:
• low risk → auto allow • medium risk → maybe constrained • high risk → require approval The idea is targeted interruption, not constant friction.
⸻
- Autonomy zones
Different environments/actions have different trust levels.
Example:
• read-only internal data → low autonomy constraints • external API writes → tighter controls • sensitive systems → very restricted Agents can operate freely within a zone, but crossing boundaries triggers stricter enforcement.
⸻
- Per-tool, per-action control (not blanket policies)
Not just “this agent can use X tool”
More like:
• what endpoints • what parameters • what frequency • in what sequence So risk is evaluated at a much more granular level.
⸻
- Hash-chained audit log (including near-misses)
Every action (allowed, blocked, escalated) is:
• logged • chained • tamper-evident Including “almost bad” behavior not just incidents.
This ended up being more useful than expected for understanding agent behavior.
⸻
- Policy engine (not hardcoded rules)
All of this runs through a policy layer (think flexible rules vs static checks), so behavior can adapt without rewriting code.
⸻
- Setup is fast (~10 min)
We tried to avoid the “months of integration” problem.
If it’s not easy to sit in the execution path, nobody will actually use it.
⸻
Why we think this matters
The failure mode we keep seeing:
agents don’t fail because of one bad prompt —
they fail because of a series of individually reasonable actions that become risky together
Most tooling doesn’t really account for that.
⸻
Would love feedback from people actually building agents
• Have you seen agents drift into risky behavior over time? • How are you controlling tool usage today (if at all)? • Does session-level risk make sense, or is that overkill? • Is “credential starvation” realistic in your setups? We are just two security guys who built a company not some McKenzie bros who are super funded. We have our first big design partners starting this month and need all these feedback from community as we can get.
[link] [comments]




