Almost for Free: Crafting Adversarial Examples with Convolutional Image Filters

arXiv cs.LG / 5/5/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces a gradient-free method for generating adversarial examples by designing adversarial image filters inspired by explainable ML and classic edge detection.
These learned 3x3 (and related) convolutional filters enable untargeted adversarial attacks that can transfer across different neural networks and are produced with a single forward pass.
Experiments show that 3x3 filters achieve success rates roughly in the 30%–80% range across multiple models, demonstrating practical attack strength.
Compared with generative-model-based adversarial crafting, the approach drastically reduces parameter counts by about five orders of magnitude, making it far more efficient.
The authors analyze the learned filter parameters and find structure and transferability patterns that relate to features commonly found in traditional image filters, reinforcing concerns about neural network fragility to malicious perturbations.

Abstract

Adversarial examples in machine learning are typically generated using gradients, obtained either directly through access to the model or approximated via queries to it. In this paper, we propose a much simpler approach to craft adversarial examples, drawing inspiration from insights of explainable machine learning. In particular, we design \emph{adversarial image filters} that are based on classic edge detection algorithms but optimized to deceive learning models. The resulting untargeted attacks are transferable and require only a single pass over the input. Empirically, we find that 3x3 filters already enable success rates between 30% and 80% on different neural networks. Compared to related approaches using generative models for crafting adversarial examples, we reduce the number of parameters by five orders of magnitude, resulting in a very efficient attack. When investigating the parameters of the learned filters, we observe interesting properties such as a high transferability between models and structures common to classic image filters. Our results provide further insights into the vulnerability of neural networks and their fragility to malicious noise.