Temporal UI State Inconsistency in Desktop GUI Agents: Formalizing and Defending Against TOCTOU Attacks on Computer-Use Agents
arXiv cs.AI / 4/22/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- Desktop GUI agents that rely on screenshot-and-click loops can suffer from an observation-to-action gap, creating a TOCTOU-style vulnerability window attackers can exploit.
- The paper formalizes this issue as a Visual Atomicity Violation and demonstrates three attack primitives, including notification overlay hijacking, window focus manipulation, and web DOM injection.
- Window focus manipulation is shown to redirect agent actions with a 100% success rate while leaving no visual evidence at the observation time.
- The proposed Pre-execution UI State Verification (PUSV) defense re-checks UI state immediately before each action using layered checks (pixel-level SSIM at the target, screenshot diffs, and X Window snapshot diffs).
- PUSV achieves 100% action interception across 180 adversarial trials with zero false positives and under 0.1s overhead, but still reveals a blind spot against DOM-injection attacks, indicating a need for deeper OS+DOM defenses.
![AI TikTok Marketing for Pet Brands [2026 Guide]](/_next/image?url=https%3A%2F%2Fmedia2.dev.to%2Fdynamic%2Fimage%2Fwidth%3D1200%2Cheight%3D627%2Cfit%3Dcover%2Cgravity%3Dauto%2Cformat%3Dauto%2Fhttps%253A%252F%252Fdev-to-uploads.s3.amazonaws.com%252Fuploads%252Farticles%252Fj35r9qm34d68qf2gq7no.png&w=3840&q=75)


