Reading Between the Pixels: An Inscriptive Jailbreak Attack on Text-to-Image Models

arXiv cs.CV / 4/8/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper identifies a new “inscriptive jailbreak” threat for text-to-image (T2I) models that can force the generation of images containing harmful, legible paragraph-length text (e.g., fraudulent documents) embedded in otherwise benign scenes.
It argues this differs from earlier “depictive” jailbreaks because the attack weaponizes character-level text-rendering fidelity, making prior coarse visual-manipulation defenses less effective.
The authors propose Etch, a black-box attack framework that splits an adversarial prompt into three orthogonal layers—semantic camouflage, visual-spatial anchoring, and typographic encoding—and iteratively refines them via a zero-order optimization loop.
A vision-language model is used to critique generated images, localize which layer(s) fail, and recommend targeted prompt revisions, enabling higher character-level control.
Experiments across 7 T2I models on two benchmarks report an average attack success rate of 65.57% with a peak of 91.00%, highlighting a typography-aware defense gap in current multimodal safety alignments.

Abstract

Modern text-to-image (T2I) models can now render legible, paragraph-length text, enabling a fundamentally new class of misuse. We identify and formalize the inscriptive jailbreak, where an adversary coerces a T2I system into generating images containing harmful textual payloads (e.g., fraudulent documents) embedded within visually benign scenes. Unlike traditional depictive jailbreaks that elicit visually objectionable imagery, inscriptive attacks weaponize the text-rendering capability itself. Because existing jailbreak techniques are designed for coarse visual manipulation, they struggle to bypass multi-stage safety filters while maintaining character-level fidelity. To expose this vulnerability, we propose Etch, a black-box attack framework that decomposes the adversarial prompt into three functionally orthogonal layers: semantic camouflage, visual-spatial anchoring, and typographic encoding. This decomposition reduces joint optimization over the full prompt space to tractable sub-problems, which are iteratively refined through a zero-order loop. In this process, a vision-language model critiques each generated image, localizes failures to specific layers, and prescribes targeted revisions. Extensive evaluations across 7 models on the 2 benchmarks demonstrate that Etch achieves an average attack success rate of 65.57% (peaking at 91.00%), significantly outperforming existing baselines. Our results reveal a critical blind spot in current T2I safety alignments and underscore the urgent need for typography-aware defense multimodal mechanisms.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 4/8DailyView insight →

Black Hat Asia

AI Business

30 Days, $0, Full Autonomy: The Real Report on Running an AI Agent Without a Credit Card

Dev.to

We are building an OS for AI-built software. Here's what that means

Dev.to

Claude Code Forgot My Code. Here's Why.

Dev.to

Whats'App Ai Assistant

Dev.to

Reading Between the Pixels: An Inscriptive Jailbreak Attack on Text-to-Image Models

Key Points

Abstract

💡 Insights using this article

Related Articles

Black Hat Asia

30 Days, $0, Full Autonomy: The Real Report on Running an AI Agent Without a Credit Card

We are building an OS for AI-built software. Here's what that means

Claude Code Forgot My Code. Here's Why.

Whats'App Ai Assistant

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer