Session Risk Memory (SRM): Temporal Authorization for Deterministic Pre-Execution Safety Gates

arXiv cs.AI / 3/25/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that deterministic per-action safety gates can be defeated by distributed attacks that split harmful intent across individually compliant steps, leaving a gap in “temporal” security at the session/trajectory level.
It introduces Session Risk Memory (SRM), a lightweight deterministic module that adds trajectory-level authorization by maintaining a compact semantic centroid and accumulating a risk signal via an exponential moving average of baseline-subtracted gate outputs.
SRM is designed to require no additional model components, training, or probabilistic inference, because it operates on the same semantic vector representation as the underlying authorization gate.
Experiments on an 80-session multi-turn benchmark (slow-burn exfiltration, gradual privilege escalation, and compliance drift) show ILION+SRM achieving F1=1.0000 with 0% false positives versus stateless ILION at F1=0.9756 with a 5% false-positive rate, while keeping 100% detection for both.
The approach formalizes a distinction between spatial authorization consistency (per action) and temporal authorization consistency (over trajectory), aiming to provide a principled basis for session-level safety in agentic systems with <250 microseconds per-turn overhead.

Abstract

Deterministic pre-execution safety gates evaluate whether individual agent actions are compatible with their assigned roles. While effective at per-action authorization, these systems are structurally blind to distributed attacks that decompose harmful intent across multiple individually-compliant steps. This paper introduces Session Risk Memory (SRM), a lightweight deterministic module that extends stateless execution gates with trajectory-level authorization. SRM maintains a compact semantic centroid representing the evolving behavioral profile of an agent session and accumulates a risk signal through exponential moving average over baseline-subtracted gate outputs. It operates on the same semantic vector representation as the underlying gate, requiring no additional model components, training, or probabilistic inference. We evaluate SRM on a multi-turn benchmark of 80 sessions containing slow-burn exfiltration, gradual privilege escalation, and compliance drift scenarios. Results show that ILION+SRM achieves F1 = 1.0000 with 0% false positive rate, compared to stateless ILION at F1 = 0.9756 with 5% FPR, while maintaining 100% detection rate for both systems. Critically, SRM eliminates all false positives with a per-turn overhead under 250 microseconds. The framework introduces a conceptual distinction between spatial authorization consistency (evaluated per action) and temporal authorization consistency (evaluated over trajectory), providing a principled basis for session-level safety in agentic systems.

Santa Augmentcode Intent Ep.6

Dev.to

Your Agent Hired Another Agent. The Output Was Garbage. The Money's Gone.

Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Palantir’s billionaire CEO says only two kinds of people will succeed in the AI era: trade workers — ‘or you’re neurodivergent’

Reddit r/artificial

Scaffolded Test-First Prompting: Get Correct Code From the First Run

Dev.to

Session Risk Memory (SRM): Temporal Authorization for Deterministic Pre-Execution Safety Gates

Key Points

Abstract

Related Articles

Santa Augmentcode Intent Ep.6

Your Agent Hired Another Agent. The Output Was Garbage. The Money's Gone.

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Palantir’s billionaire CEO says only two kinds of people will succeed in the AI era: trade workers — ‘or you’re neurodivergent’

Scaffolded Test-First Prompting: Get Correct Code From the First Run

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer