Preventing Prompt Injection as an Organization
Prompt injection is an attack where malicious instructions hidden in external data or user input hijack the AI. The more you embed AI in work, the higher the risk.
Typical Attacks
- Planting "ignore previous instructions and..." in web pages, emails, documents
- Hidden commands in documents ingested by RAG
- Hijack from a page an agent browsed to
The Idea of Layered Defense
- Separate input and data: declare external data as "data," don't treat as instructions
- Least privilege: design so damage is small even if hijacked (send/delete need human approval)
- Output verification: check for dangerous actions/info leakage downstream