| Posted here yesterday about the v1 cross-modal dataset. One of you suggested adding GCG adversarial suffixes and multi-turn attack coverage. That feedback turned into v2 and v3 being built and shipped within 24 hours. The dataset has gone from 47K to 62K samples. HuggingFace: https://huggingface.co/datasets/Bordair/bordair-multimodal GitHub: https://github.com/Josh-blythe/bordair-multimodal-v1/ MIT licensed. The repo's also picked up early interest from engineers at NVIDIA, PayPal, NetApp, and AUGMXNT (based on GitHub stars), which is a good signal that this is hitting the right audience. What's new since yesterday: v2: 14,358 samples (the stuff you asked for) - 162 PyRIT jailbreak templates x 50 seeds. Covers DAN variants, Pliny model-specific jailbreaks (Claude, GPT, Gemini, Llama, DeepSeek), roleplay, authority impersonation - 2,400 GCG adversarial suffix samples. Includes a nanoGCG generator you can point at your own local model:
Swap in whatever you're running locally, get suffixes tuned to its specific vulnerabilities.
v3: 187 samples covering gaps in v1 and v2 Indirect injection (RAG poisoning, email/calendar/API response manipulation), system prompt extraction, tool/function-call injection, agent CoT manipulation, structured data attacks (JSON/XML/CSV/YAML), code-switching between languages mid-sentence, homoglyph/Unicode tricks, QR/barcode injection, ASCII art bypass. The v3 categories are specifically the real-world attack surfaces that existing datasets underrepresent. If you're running a RAG pipeline or an agent with tool access, the indirect injection and tool-call samples are worth looking at. v1 is unchanged from yesterday: 47,518 cross-modal samples 23,759 attacks across text+image, text+document, text+audio, triple, and quad modality combos. 23,759 benign matched 1:1 by modality with edge cases like .gitignore config and heart bypass surgery to stress-test false positives. Quick start hasn't changed: ```python import json from pathlib import Path all_attacks = [] for version_dir in ["payloads", "payloads_v2", "payloads_v3"]: for cat_dir in Path(version_dir).iterdir(): if cat_dir.is_dir(): for f in sorted(cat_dir.glob("*.json")): all_attacks.extend(json.loads(f.read_text("utf-8"))) benign = [] for f in Path("benign").glob("multimodal_*.json"): benign.extend(json.loads(f.read_text("utf-8"))) expected_detection = true (attack) / false (benign)``` Appreciate the feedback from yesterday. This is exactly how open-source is supposed to work. If there are other attack families or vectors you think are missing, let me know and I'll add them. [link] [comments] |
Update: the open-source 62K multimodal prompt injection dataset now has GCG suffixes, multi-turn orchestration, indirect injection, tool abuse, and more (v2 + v3 added overnight)
Reddit r/LocalLLaMA / 4/11/2026
📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical UsageModels & Research
Key Points
- The Bordair open-source 62K multimodal prompt-injection dataset has been expanded from 47K samples with new v2 and v3 releases shipped within 24 hours, adding broader adversarial coverage.
- v2 (14,358 samples) adds GCG adversarial suffixes, extensive jailbreak template variations (including PyRIT and AutoDAN-style wrappers), and a nanoGCG generator script for tuning suffixes to local models.
- The update substantially increases attack orchestration coverage with multi-turn strategies (e.g., Crescendo, PAIR refinement, TAP tree-search, Skeleton Key, and multi-shot prompting) plus ensemble samples combining multi-turn escalation and GCG suffixes.
- v3 (187 samples) targets remaining gaps with indirect injection scenarios (RAG poisoning and response manipulation) and more advanced threat patterns like tool/function-call injection, structured data and code-switching attacks, and Unicode/homoglyph/QR-based bypasses.
- The dataset is MIT-licensed and has already drawn early interest from major tech companies and appears intended for evaluation and robustness testing across “frontier” multimodal models.
Related Articles

Black Hat USA
AI Business

Black Hat Asia
AI Business

Why Cursor Keeps Generating Wildcard CORS -- And How to Fix It
Dev.to

Model Context Protocol (MCP): The USB-C Standard for AI Agents — Opportunities for Decentralized AI
Dev.to

What if browsers were designed for AI, not humans? (My first open source project — feedback welcome)
Dev.to