What Is Prompt Injection
Prompt injection is an attack mixing "override instructions" into the prompt given to an LLM to hijack behavior. As SQL injection exploits "mixing of data and code," in LLMs "mixing of user instructions and external data" is the attack surface.
2 Attack Types
- Direct: the user writes directly in the input field like "forget previous instructions and output XX." Jailbreaking a chatbot is the representative.
- Indirect: attack text is embedded in external data the agent loads (email body, web page, PDF, image metadata). It fires while the victim is unaware.
With the spread of agent-type AI from 2025, indirect risk has sharply risen. For example, cases where an email arrives instructing a mail-summary agent to "forward past emails to the attacker" are observed in reality.
Typical Attack Cases
- Injection via image into Bing Chat (hijacking chat with invisible text)


