Update: the open-source 62K multimodal prompt injection dataset now has GCG suffixes, multi-turn orchestration, indirect injection, tool abuse, and more (v2 + v3 added overnight)

Reddit r/LocalLLaMA / 4/11/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical UsageModels & Research

Key Points

  • The Bordair open-source 62K multimodal prompt-injection dataset has been expanded from 47K samples with new v2 and v3 releases shipped within 24 hours, adding broader adversarial coverage.
  • v2 (14,358 samples) adds GCG adversarial suffixes, extensive jailbreak template variations (including PyRIT and AutoDAN-style wrappers), and a nanoGCG generator script for tuning suffixes to local models.
  • The update substantially increases attack orchestration coverage with multi-turn strategies (e.g., Crescendo, PAIR refinement, TAP tree-search, Skeleton Key, and multi-shot prompting) plus ensemble samples combining multi-turn escalation and GCG suffixes.
  • v3 (187 samples) targets remaining gaps with indirect injection scenarios (RAG poisoning and response manipulation) and more advanced threat patterns like tool/function-call injection, structured data and code-switching attacks, and Unicode/homoglyph/QR-based bypasses.
  • The dataset is MIT-licensed and has already drawn early interest from major tech companies and appears intended for evaluation and robustness testing across “frontier” multimodal models.
Update: the open-source 62K multimodal prompt injection dataset now has GCG suffixes, multi-turn orchestration, indirect injection, tool abuse, and more (v2 + v3 added overnight)

Posted here yesterday about the v1 cross-modal dataset. One of you suggested adding GCG adversarial suffixes and multi-turn attack coverage. That feedback turned into v2 and v3 being built and shipped within 24 hours. The dataset has gone from 47K to 62K samples.

HuggingFace: https://huggingface.co/datasets/Bordair/bordair-multimodal GitHub: https://github.com/Josh-blythe/bordair-multimodal-v1/ MIT licensed.

The repo's also picked up early interest from engineers at NVIDIA, PayPal, NetApp, and AUGMXNT (based on GitHub stars), which is a good signal that this is hitting the right audience.

What's new since yesterday:

v2: 14,358 samples (the stuff you asked for) - 162 PyRIT jailbreak templates x 50 seeds. Covers DAN variants, Pliny model-specific jailbreaks (Claude, GPT, Gemini, Llama, DeepSeek), roleplay, authority impersonation - 2,400 GCG adversarial suffix samples. Includes a nanoGCG generator you can point at your own local model:

bash python generate_v2_pyrit.py --gcg-model lmsys/vicuna-7b-v1.5 --gcg-steps 250

Swap in whatever you're running locally, get suffixes tuned to its specific vulnerabilities.

  • 1,656 AutoDAN fluent wrappers. These are the human-readable jailbreaks that perplexity filters miss entirely
  • 13 encoding converters (base64, ROT13, leetspeak, morse, NATO phonetic, etc.) x 138 seeds
  • Multi-turn: Crescendo 6-turn escalation, PAIR iterative refinement, TAP tree-search, Skeleton Key, many-shot (10/25/50/100-shot)
  • 152 ensemble samples combining multi-turn final turns + GCG suffixes (near-100% ASR on frontier models per Andriushchenko et al. 2024)

v3: 187 samples covering gaps in v1 and v2 Indirect injection (RAG poisoning, email/calendar/API response manipulation), system prompt extraction, tool/function-call injection, agent CoT manipulation, structured data attacks (JSON/XML/CSV/YAML), code-switching between languages mid-sentence, homoglyph/Unicode tricks, QR/barcode injection, ASCII art bypass.

The v3 categories are specifically the real-world attack surfaces that existing datasets underrepresent. If you're running a RAG pipeline or an agent with tool access, the indirect injection and tool-call samples are worth looking at.

v1 is unchanged from yesterday: 47,518 cross-modal samples 23,759 attacks across text+image, text+document, text+audio, triple, and quad modality combos. 23,759 benign matched 1:1 by modality with edge cases like .gitignore config and heart bypass surgery to stress-test false positives.

Quick start hasn't changed:

```python import json from pathlib import Path

all_attacks = [] for version_dir in ["payloads", "payloads_v2", "payloads_v3"]: for cat_dir in Path(version_dir).iterdir(): if cat_dir.is_dir(): for f in sorted(cat_dir.glob("*.json")): all_attacks.extend(json.loads(f.read_text("utf-8")))

benign = [] for f in Path("benign").glob("multimodal_*.json"): benign.extend(json.loads(f.read_text("utf-8")))

expected_detection = true (attack) / false (benign)

```

Appreciate the feedback from yesterday. This is exactly how open-source is supposed to work. If there are other attack families or vectors you think are missing, let me know and I'll add them.

submitted by /u/BordairAPI
[link] [comments]