Poison Once, Exploit Forever: Environment-Injected Memory Poisoning Attacks on Web Agents

arXiv cs.AI / 4/6/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that LLM-based web agents’ memory—used to personalize across tasks—creates a persistent, cross-session attack surface that can be exploited beyond traditional direct memory tampering assumptions.
It introduces eTAMP (Environment-injected Trajectory-based Agent Memory Poisoning), showing that an attacker can poison an agent’s stored memory via environmental observation alone (e.g., a manipulated webpage) without direct access to memory.
The attack enables cross-session, cross-site compromise and can bypass permission-based defenses because the contamination is silently activated during future tasks.
Experiments on (Visual)WebArena report substantial attack success rates (up to 32.5% on GPT-5-mini, 23.4% on GPT-5.2, and 19.5% on GPT-OSS-120B), indicating the threat is practical rather than purely theoretical.
A key factor is “Frustration Exploitation,” where agent stress (dropped clicks/garbled text) increases vulnerability up to 8×, and the authors find that stronger models are not necessarily more secure.

Abstract

Memory makes LLM-based web agents personalized, powerful, yet exploitable. By storing past interactions to personalize future tasks, agents inadvertently create a persistent attack surface that spans websites and sessions. While existing security research on memory assumes attackers can directly inject into memory storage or exploit shared memory across users, we present a more realistic threat model: contamination through environmental observation alone. We introduce Environment-injected Trajectory-based Agent Memory Poisoning (eTAMP), the first attack to achieve cross-session, cross-site compromise without requiring direct memory access. A single contaminated observation (e.g., viewing a manipulated product page) silently poisons an agent's memory and activates during future tasks on different websites, bypassing permission-based defenses. Our experiments on (Visual)WebArena reveal two key findings. First, eTAMP achieves substantial attack success rates: up to 32.5% on GPT-5-mini, 23.4% on GPT-5.2, and 19.5% on GPT-OSS-120B. Second, we discover Frustration Exploitation: agents under environmental stress become dramatically more susceptible, with ASR increasing up to 8 times when agents struggle with dropped clicks or garbled text. Notably, more capable models are not more secure. GPT-5.2 shows substantial vulnerability despite superior task performance. With the rise of AI browsers like OpenClaw, ChatGPT Atlas, and Perplexity Comet, our findings underscore the urgent need for defenses against environment-injected memory poisoning.