Vision Transformers Under Data Poisoning Attacks
Loading...
Date
2023
Authors
Peery, Gabriel
Journal Title
Journal ISSN
Volume Title
Publisher
University of Oregon
Abstract
Owing to state-of-the-art performance and parallelizability, the Vision Transformer architecture is growing in prevalence for security-critical computer vision tasks. Designers may collect training images from public sources, but such data may be sabotaged; otherwise natural images may have subtle patterns added to them, crafted to cause a specific image to be incorrectly classified after training. Poisoning attack methods have been developed and tested on ResNets, but Vision Transformers' vulnerability has not been investigated. I develop a new poisoning attack method that augments Witches' Brew with heuristics for choosing which images to poison. I use it to attack DeiT, a Vision Transformer, while it is fine-tuned for benchmarks like classifying CIFAR-10. I also evaluate how DeiT's image tokenization introduces risk in the form of efficient attacks where sample modification is constrained to a limited count of patches. Progressively tightening constraints in extensive experiments, I compare the strength of attacks by observing which remain successful under the most challenging limitations. Accordingly, I find that the choice of objective greatly influences strength. In addition, I find that constraints on patch count deteriorate success rate more than those on image count. Attention rollout selection helps compensate, but image selection by gradient magnitude increases strength more. I find that Mixup and Cutmix are an effective defense, so I recommend them in security-critical applications.
Description
72 pages
Keywords
Deep learning, Data poisoning, Vision Transformer, Cybersecurity, Computer science