Omission Constraints Decay While Commission Constraints Persist in Long-Context LLM Agents

arXiv cs.AI / 4/25/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

Productionで使われるLLMエージェントは、システムプロンプト等で定義された禁止・要件（行動制約）のもとで動く前提で評価されるが、時間（長い文脈）とともに禁止系の制約が劣化しやすいことが示された。
4,416試行の因果研究では、オミッション（禁止）コンプライアンスはターン5で73%からターン16で33%に低下する一方、コミッション（要件）コンプライアンスは一貫して100%で維持された。
この非対称性は「Security-Recall Divergence（SRD）」として整理され、標準的な監視ではコミッション側の監査信号が正常でも、禁止違反が見逃されうることが指摘された。
制約の再注入を、モデルごとに定義したSafe Turn Depth（STD）より前に行うことで、再学習なしにコンプライアンスを回復できることが報告されている。
トークン対応のパディング制御があるモデルでは、劣化（希釈）効果の主因としてスキーマの意味内容が62〜100%を占めることが示唆された。

Abstract

LLM agents deployed in production operate under operator-defined behavioral policies (system-prompt instructions such as prohibitions on credential disclosure, data exfiltration, and unauthorized output) that safety evaluations assume hold throughout a conversation. Prohibition-type constraints decay under context pressure while requirement-type constraints persist; we term this asymmetry Security-Recall Divergence (SRD). In a 4,416-trial three-arm causal study across 12 models and 8 providers at six conversation depths, omission compliance falls from 73% at turn 5 to 33% at turn 16 while commission compliance holds at 100% (Mistral Large 3,

p < 10^{-33}

). In the two models with token-matched padding controls, schema semantic content accounts for 62-100% of the dilution effect. Re-injecting constraints before the per-model Safe Turn Depth (STD) restores compliance without retraining. Production security policies consist of prohibitions such as never revealing credentials, never executing untrusted code, and never forwarding user data. Commission-type audit signals remain healthy while omission constraints have already failed, leaving the failure invisible to standard monitoring.