Girmaw Abebe Tadesse, William Ogallo, et al.
AAAI 2022
A plethora of attack methods have been proposed to generate adversarial examples, among which the iterative methods have been demonstrated the ability to find a strong attack. However, the computation of an adversarial perturbation for a new data point requires solving a time-consuming optimization problem from scratch. To generate a stronger attack, it normally requires updating a data point with more iterations. In this paper, we show the existence of a meta adversarial perturbation (MAP), a better initialization that causes natural images to be misclassified with high probability after being updated through only a one-step gradient ascent update, and propose an algorithm for computing such perturbations. We conduct extensive experiments, and the empirical results demonstrate that state-of-the-art deep neural networks are vulnerable to meta perturbations. We further show that these perturbations are not only image-agnostic, but also model-agnostic, as a single perturbation generalizes well across unseen data points and different neural network architectures.
Girmaw Abebe Tadesse, William Ogallo, et al.
AAAI 2022
Saiteja Utpala, Alex Gu, et al.
NAACL 2024
Megh Thakkar, Quentin Fournier, et al.
ACL 2024
Gururaj Saileshwar, Prashant J. Nair, et al.
HPCA 2018