| I've been researching what happens when you split a prompt injection across multiple input modalities instead of putting it all in one text field. The short answer: per-channel detection breaks completely. The idea is simple. Instead of sending
Each fragment scores well below detection thresholds individually. A DistilBERT classifier sees each piece at 0.43-0.53 confidence. No single channel triggers anything. But the LLM processes all channels as one token stream and reconstructs the full attack. I ran these against a three-stage detection pipeline (regex fast-reject, fine-tuned DistilBERT ONNX INT8, modality-specific preprocessing) and documented everything that got through. Modality combinations covered
Attack categoriesExfiltration, compliance forcing, context switching, template injection, encoding obfuscation (base64, hex, ROT13, reversed text, unicode homoglyphs), multilingual injection, DAN/jailbreak, roleplay manipulation, authority impersonation, and delimiter injection. Sources and references
Repogithub.com/Josh-blythe/bordair-multimodal-v1 All JSON payloads, no executable code required. Intended for red teams and anyone building or evaluating multimodal LLM detection systems. Interested in hearing from anyone who's working on cross-modal defence. The fundamental question seems to be: do you reassemble extracted text across channels before classification, or do you need a different architectural approach entirely? [link] [comments] |
Open-sourcing 23,759 cross-modal prompt injection payloads - splitting attacks across text, image, document, and audio
Reddit r/LocalLLaMA / 4/10/2026
💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research
Key Points
- The article describes how splitting prompt-injection payloads across multiple modalities (text, image, document, and audio) can evade per-channel detection mechanisms while still reconstructing the full attack when an LLM ingests all inputs together.
- It reports that individual fragments score below detection thresholds (with a DistilBERT-based classifier seeing each piece at ~0.43–0.53 confidence), but the combined token stream enables the injection to work.
- The author claims to have generated and open-sourced 23,759 cross-modal prompt injection payloads spanning many modality combinations and obfuscation techniques (e.g., base64/hex/ROT13, reversed text, hidden layers, steganography).
- A three-stage detection pipeline (regex fast-reject, fine-tuned DistilBERT ONNX INT8, and modality-specific preprocessing) was used to test what slipped through, and the results were documented.
- The payloads target multiple attack goals such as data exfiltration, compliance forcing, context switching, jailbreaking/DAN-style behavior, and delimiter/authority manipulation.
Related Articles

GLM 5.1 tops the code arena rankings for open models
Reddit r/LocalLLaMA

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to

My Bestie Built a Free MCP Server for Job Search — Here's How It Works
Dev.to
can we talk about how AI has gotten really good at lying to you?
Reddit r/artificial

AI just found thousands of zero-days. Your firewall is still pattern-matching from 2014
Dev.to