広告

エージェント向けに完全に決定論的な制御レイヤーを構築しました。フィードバックがあれば嬉しいです。営業ではありません

Reddit r/artificial / 2026/3/30

💬 オピニオンDeveloper Stack & InfrastructureSignals & Early TrendsIdeas & Deep Analysis

要点

  • この記事では、AIエージェントとツールの間の実行経路に配置する完全に決定論的な「制御レイヤー」を提案し、各試行アクションをリアルタイムで許可/ブロック/承認要求できるようにします。
  • プロンプト/アイデンティティ/出力の制御を超えたセキュリティ機構として、資格情報の枯渇(credential starvation)、エージェントの振る舞い全体にわたるセッションベースのリスク段階的エスカレーション、そしてリスクが十分に高い場合のみ HITL(人間の介入)を行う点を強調しています。
  • 「自律ゾーン(autonomy zones)」を用いて、環境や実行する行為の機微度(例:読み取り専用、外部への書き込み、機微なシステム等)に応じて、エージェントに与えられる自由度を変えます。
  • 一律の権限ではなく、ツールごと/アクションごとのきめ細かな強制(エンドポイント、パラメータ、頻度、シーケンス)を実装し、さらに、ニアミス試行も含むハッシュチェーンで改ざん検知可能な監査ログ(tamper-evident audit log)をサポートします。
  • 方針エンジンが、柔軟な形で(ハードコードされたルールではなく)これらの判断を駆動し、導入は約10分と高速です。著者は企業の売り込みではなく、フィードバックを求めています。

Most of the current “AI security” stack seems focused on:

• prompts • identities • outputs 

After an agent deleted a prod database on me a year ago. I saw the gap and started building.

a control layer directly in the execution path between agents and tools. We are to market but I don’t want to spam yall with our company so I left it out.

What that actually means

Every time an agent tries to take an action (API call, DB read, file access, etc.), we intercept it and decide in real time:

• allow • block • require approval 

But the important part is how that decision is made.

A few things we’re doing differently

  1. Credential starvation (instead of trusting long-lived access)

Agents don’t get broad, persistent credentials.

They effectively operate with nothing by default, and access is granted per action based on policy + context.

  1. Session-based risk escalation (not stateless checks)

We track behavior across the entire session.

Example:

• one DB read → fine • 20 sequential reads + export → risk escalates • tool chaining → risk escalates 

So decisions aren’t per-call—they’re based on what the agent has been doing over time.

  1. HITL only when it actually matters

We don’t want humans in the loop for everything.

Instead:

• low risk → auto allow • medium risk → maybe constrained • high risk → require approval 

The idea is targeted interruption, not constant friction.

  1. Autonomy zones

Different environments/actions have different trust levels.

Example:

• read-only internal data → low autonomy constraints • external API writes → tighter controls • sensitive systems → very restricted 

Agents can operate freely within a zone, but crossing boundaries triggers stricter enforcement.

  1. Per-tool, per-action control (not blanket policies)

Not just “this agent can use X tool”

More like:

• what endpoints • what parameters • what frequency • in what sequence 

So risk is evaluated at a much more granular level.

  1. Hash-chained audit log (including near-misses)

Every action (allowed, blocked, escalated) is:

• logged • chained • tamper-evident 

Including “almost bad” behavior not just incidents.

This ended up being more useful than expected for understanding agent behavior.

  1. Policy engine (not hardcoded rules)

All of this runs through a policy layer (think flexible rules vs static checks), so behavior can adapt without rewriting code.

  1. Setup is fast (~10 min)

We tried to avoid the “months of integration” problem.

If it’s not easy to sit in the execution path, nobody will actually use it.

Why we think this matters

The failure mode we keep seeing:

agents don’t fail because of one bad prompt —

they fail because of a series of individually reasonable actions that become risky together

Most tooling doesn’t really account for that.

Would love feedback from people actually building agents

• Have you seen agents drift into risky behavior over time? • How are you controlling tool usage today (if at all)? • Does session-level risk make sense, or is that overkill? • Is “credential starvation” realistic in your setups? 

We are just two security guys who built a company not some McKenzie bros who are super funded. We have our first big design partners starting this month and need all these feedback from community as we can get.

submitted by /u/EbbCommon9300
[link] [comments]

広告