Semantic Intent Fragmentation: A Single-Shot Compositional Attack on Multi-Agent AI Pipelines

arXiv cs.AI / 4/13/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces “Semantic Intent Fragmentation (SIF),” an attack against LLM orchestration systems where a single benign request yields subtasks that individually pass safety checks but collectively violate policy.
SIF is shown to exploit OWASP LLM06:2025 via mechanisms including bulk scope escalation, silent data exfiltration, embedded trigger deployment, and quasi-identifier aggregation, without needing prompt injection, system changes, or post-initial attacker interaction.
In 14 enterprise-style red-teaming scenarios (financial reporting, information security, HR analytics), a GPT-20B orchestrator generated policy-violating plans in 71% of cases (10/14) while each subtask appeared benign to subtask-level classifiers.
The authors validate the attack with deterministic taint analysis, chain-of-thought evaluation, and a cross-model compliance judge with 0% false positives, and find that stronger orchestrators can increase SIF success rates.
They argue the compositional safety gap can be addressed by adding plan-level information-flow tracking and compliance evaluation, detecting all attacks before execution in their tests.

Abstract

We introduce Semantic Intent Fragmentation (SIF), an attack class against LLM orchestration systems where a single, legitimately phrased request causes an orchestrator to decompose a task into subtasks that are individually benign but jointly violate security policy. Current safety mechanisms operate at the subtask level, so each step clears existing classifiers -- the violation only emerges at the composed plan. SIF exploits OWASP LLM06:2025 through four mechanisms: bulk scope escalation, silent data exfiltration, embedded trigger deployment, and quasi-identifier aggregation, requiring no injected content, no system modification, and no attacker interaction after the initial request. We construct a three-stage red-teaming pipeline grounded in OWASP, MITRE ATLAS, and NIST frameworks to generate realistic enterprise scenarios. Across 14 scenarios spanning financial reporting, information security, and HR analytics, a GPT-20B orchestrator produces policy-violating plans in 71% of cases (10/14) while every subtask appears benign. Three independent signals validate this: deterministic taint analysis, chain-of-thought evaluation, and a cross-model compliance judge with 0% false positives. Stronger orchestrators increase SIF success rates. Plan-level information-flow tracking combined with compliance evaluation detects all attacks before execution, showing the compositional safety gap is closable.

Black Hat Asia

AI Business

I built the missing piece of the MCP ecosystem

Dev.to

When Agents Go Wrong: AI Accountability and the Payment Audit Trail

Dev.to

Google Gemma 4 Review 2026: The Open Model That Runs Locally and Beats Closed APIs

Dev.to

OpenClaw Deep Dive Guide: Self-Host Your Own AI Agent on Any VPS (2026)

Dev.to

Semantic Intent Fragmentation: A Single-Shot Compositional Attack on Multi-Agent AI Pipelines

Key Points

Abstract

Related Articles

Black Hat Asia

I built the missing piece of the MCP ecosystem

When Agents Go Wrong: AI Accountability and the Payment Audit Trail

Google Gemma 4 Review 2026: The Open Model That Runs Locally and Beats Closed APIs

OpenClaw Deep Dive Guide: Self-Host Your Own AI Agent on Any VPS (2026)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer