When LoRA Betrays: Backdooring Text-to-Image Models by Masquerading as Benign Adapters

arXiv cs.CV / 4/27/2026

💬 OpinionSignals & Early TrendsModels & Research

共有:

Key Points

The paper introduces “Masquerade-LoRA (MasqLoRA),” a systematic attack method that uses a standalone LoRA adapter to stealthily backdoor text-to-image diffusion models.
By keeping the base model frozen and training only low-rank adapter weights with a small set of trigger word–target image pairs, the attacker creates a malicious adapter that is behaviorally indistinguishable from a benign LoRA until activated.
The backdoor works via a hidden cross-modal mapping: when a specific text trigger and the malicious LoRA are used, the model outputs a predefined visual result.
Experiments show the attack can be trained with minimal overhead and reaches a very high attack success rate of 99.8%, indicating a serious risk for the LoRA-heavy open sharing ecosystem.
The authors argue the AI supply chain needs urgent, dedicated defenses tailored to modular adapter-based workflows like LoRA sharing.

Abstract

Low-Rank Adaptation (LoRA) has emerged as a leading technique for efficiently fine-tuning text-to-image diffusion models, and its widespread adoption on open-source platforms has fostered a vibrant culture of model sharing and customization. However, the same modular and plug-and-play flexibility that makes LoRA appealing also introduces a broader attack surface. To highlight this risk, we propose Masquerade-LoRA (MasqLoRA), the first systematic attack framework that leverages an independent LoRA module as the attack vehicle to stealthily inject malicious behavior into text-to-image diffusion models. MasqLoRA operates by freezing the base model parameters and updating only the low-rank adapter weights using a small number of "trigger word-target image" pairs. This enables the attacker to train a standalone backdoor LoRA module that embeds a hidden cross-modal mapping: when the module is loaded and a specific textual trigger is provided, the model produces a predefined visual output; otherwise, it behaves indistinguishably from the benign model, ensuring the stealthiness of the attack. Experimental results demonstrate that MasqLoRA can be trained with minimal resource overhead and achieves a high attack success rate of 99.8%. MasqLoRA reveals a severe and unique threat in the AI supply chain, underscoring the urgent need for dedicated defense mechanisms for the LoRA-centric sharing ecosystem.