Cross-Session Threats in AI Agents: Benchmark, Evaluation, and Algorithms

arXiv cs.CL / 4/24/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

共有:

Key Points

The paper argues that current AI-agent guardrails are “memoryless,” allowing attackers to spread a single attack across many sessions and evade detectors that judge each message in isolation.
It introduces CSTM-Bench, a new benchmark dataset with 26 attack taxonomies across kill-chain stages, explicitly covering cross-session operations (accumulate, compose, launder, inject_on_reader) anchored to seven identity anchors, plus benign confounders and two evaluation splits.
For evaluation, the study models cross-session detection as an information bottleneck feeding a downstream correlator LLM and finds that both session-bound judging and full-log correlation significantly lose attack recall when moving to cross-session scenarios.
It proposes a bounded-memory algorithm (Coreset Memory Reader, K=50) as the only reader type whose recall remains robust across both benchmark shards, and adds a stability-focused metric (CSR_prefix) and a combined score (CSTM) to balance detection quality and serving stability.

Abstract

AI-agent guardrails are memoryless: each message is judged in isolation, so an adversary who spreads a single attack across dozens of sessions slips past every session-bound detector because only the aggregate carries the payload. We make three contributions to cross-session threat detection. (1) Dataset. CSTM-Bench is 26 executable attack taxonomies classified by kill-chain stage and cross-session operation (accumulate, compose, launder, inject_on_reader), each bound to one of seven identity anchors that ground-truth "violation" as a policy predicate, plus matched Benign-pristine and Benign-hard confounders. Released on Hugging Face as intrinsec-ai/cstm-bench with two 54-scenario splits: dilution (compositional) and cross_session (12 isolation-invisible scenarios produced by a closed-loop rewriter that softens surface phrasing while preserving cross-session artefacts). (2) Measurement. Framing cross-session detection as an information bottleneck to a downstream correlator LLM, we find that a session-bound judge and a Full-Log Correlator concatenating every prompt into one long-context call both lose roughly half their attack recall moving from dilution to cross_session, well inside any frontier context window. Scope: 54 scenarios per shard, one correlator family (Anthropic Claude), no prompt optimisation; we release it to motivate larger, multi-provider datasets. (3) Algorithm and metric. A bounded-memory Coreset Memory Reader retaining highest-signal fragments at

K=50

is the only reader whose recall survives both shards. Because ranker reshuffles break KV-cache prefix reuse, we promote

\mathrm{CSR\_prefix}

(ordered prefix stability, LLM-free) to a first-class metric and fuse it with detection into

\mathrm{CSTM} = 0.7 F_1(\mathrm{CSDA@action}, \mathrm{precision}) + 0.3 \mathrm{CSR\_prefix}

, benchmarking rankers on a single Pareto of recall versus serving stability.

Context Engineering for Developers: A Practical Guide (2026)

Dev.to

GPT-5.5 is here. So is DeepSeek V4. And honestly, I am tired of version numbers.

Dev.to

AI Visibility Tracking Exploded in 2026: 6 Tools Every Brand Needs Now

Dev.to

I Built an AI Image Workflow with GPT Image 2.0 (+ Fixing Its Biggest Flaw)

Dev.to

Max-and-Omnis/Nemotron-3-Super-64B-A12B-Math-REAP-GGUF

Reddit r/LocalLLaMA

Cross-Session Threats in AI Agents: Benchmark, Evaluation, and Algorithms

Key Points

Abstract

Related Articles

Context Engineering for Developers: A Practical Guide (2026)

GPT-5.5 is here. So is DeepSeek V4. And honestly, I am tired of version numbers.

AI Visibility Tracking Exploded in 2026: 6 Tools Every Brand Needs Now

I Built an AI Image Workflow with GPT Image 2.0 (+ Fixing Its Biggest Flaw)

Max-and-Omnis/Nemotron-3-Super-64B-A12B-Math-REAP-GGUF

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer