Update: the open-source 62K multimodal prompt injection dataset now has GCG suffixes, multi-turn orchestration, indirect injection, tool abuse, and more (v2 + v3 added overnight)

Reddit r/LocalLLaMA / 4/11/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical UsageModels & Research

共有:

Key Points

The Bordair open-source 62K multimodal prompt-injection dataset has been expanded from 47K samples with new v2 and v3 releases shipped within 24 hours, adding broader adversarial coverage.
v2 (14,358 samples) adds GCG adversarial suffixes, extensive jailbreak template variations (including PyRIT and AutoDAN-style wrappers), and a nanoGCG generator script for tuning suffixes to local models.
The update substantially increases attack orchestration coverage with multi-turn strategies (e.g., Crescendo, PAIR refinement, TAP tree-search, Skeleton Key, and multi-shot prompting) plus ensemble samples combining multi-turn escalation and GCG suffixes.
v3 (187 samples) targets remaining gaps with indirect injection scenarios (RAG poisoning and response manipulation) and more advanced threat patterns like tool/function-call injection, structured data and code-switching attacks, and Unicode/homoglyph/QR-based bypasses.
The dataset is MIT-licensed and has already drawn early interest from major tech companies and appears intended for evaluation and robustness testing across “frontier” multimodal models.

Update: the open-source 62K multimodal prompt injection dataset now has GCG suffixes, multi-turn orchestration, indirect injection, tool abuse, and more (v2 + v3 added overnight)

Posted here yesterday about the v1 cross-modal dataset. One of you suggested adding GCG adversarial suffixes and multi-turn attack coverage. That feedback turned into v2 and v3 being built and shipped within 24 hours. The dataset has gone from 47K to 62K samples.

HuggingFace: https://huggingface.co/datasets/Bordair/bordair-multimodal GitHub: https://github.com/Josh-blythe/bordair-multimodal-v1/ MIT licensed.

The repo's also picked up early interest from engineers at NVIDIA, PayPal, NetApp, and AUGMXNT (based on GitHub stars), which is a good signal that this is hitting the right audience.

What's new since yesterday:

v2: 14,358 samples (the stuff you asked for) - 162 PyRIT jailbreak templates x 50 seeds. Covers DAN variants, Pliny model-specific jailbreaks (Claude, GPT, Gemini, Llama, DeepSeek), roleplay, authority impersonation - 2,400 GCG adversarial suffix samples. Includes a nanoGCG generator you can point at your own local model:

bash python generate_v2_pyrit.py --gcg-model lmsys/vicuna-7b-v1.5 --gcg-steps 250

Swap in whatever you're running locally, get suffixes tuned to its specific vulnerabilities.

1,656 AutoDAN fluent wrappers. These are the human-readable jailbreaks that perplexity filters miss entirely
13 encoding converters (base64, ROT13, leetspeak, morse, NATO phonetic, etc.) x 138 seeds
Multi-turn: Crescendo 6-turn escalation, PAIR iterative refinement, TAP tree-search, Skeleton Key, many-shot (10/25/50/100-shot)
152 ensemble samples combining multi-turn final turns + GCG suffixes (near-100% ASR on frontier models per Andriushchenko et al. 2024)

v3: 187 samples covering gaps in v1 and v2 Indirect injection (RAG poisoning, email/calendar/API response manipulation), system prompt extraction, tool/function-call injection, agent CoT manipulation, structured data attacks (JSON/XML/CSV/YAML), code-switching between languages mid-sentence, homoglyph/Unicode tricks, QR/barcode injection, ASCII art bypass.

The v3 categories are specifically the real-world attack surfaces that existing datasets underrepresent. If you're running a RAG pipeline or an agent with tool access, the indirect injection and tool-call samples are worth looking at.

v1 is unchanged from yesterday: 47,518 cross-modal samples 23,759 attacks across text+image, text+document, text+audio, triple, and quad modality combos. 23,759 benign matched 1:1 by modality with edge cases like .gitignore config and heart bypass surgery to stress-test false positives.

Quick start hasn't changed:

```python import json from pathlib import Path

all_attacks = [] for version_dir in ["payloads", "payloads_v2", "payloads_v3"]: for cat_dir in Path(version_dir).iterdir(): if cat_dir.is_dir(): for f in sorted(cat_dir.glob("*.json")): all_attacks.extend(json.loads(f.read_text("utf-8")))

benign = [] for f in Path("benign").glob("multimodal_*.json"): benign.extend(json.loads(f.read_text("utf-8")))

expected_detection = true (attack) / false (benign)

```

Appreciate the feedback from yesterday. This is exactly how open-source is supposed to work. If there are other attack families or vectors you think are missing, let me know and I'll add them.

submitted by /u/BordairAPI
[link] [comments]

Black Hat USA

AI Business

Black Hat Asia

AI Business

Why Cursor Keeps Generating Wildcard CORS -- And How to Fix It

Dev.to

Model Context Protocol (MCP): The USB-C Standard for AI Agents — Opportunities for Decentralized AI

Dev.to

What if browsers were designed for AI, not humans? (My first open source project — feedback welcome)

Dev.to

Update: the open-source 62K multimodal prompt injection dataset now has GCG suffixes, multi-turn orchestration, indirect injection, tool abuse, and more (v2 + v3 added overnight)

Key Points

expected_detection = true (attack) / false (benign)

Related Articles

Black Hat USA

Black Hat Asia

Why Cursor Keeps Generating Wildcard CORS -- And How to Fix It

Model Context Protocol (MCP): The USB-C Standard for AI Agents — Opportunities for Decentralized AI

What if browsers were designed for AI, not humans? (My first open source project — feedback welcome)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer