Regularized Adversarial Training for Robust CNN Image Classification: Evaluation of ATWR, ATGR, and EATR Under White-Box Attacks

Thi Thanh Thuy Pham, Bao-Chau Ho, Huong-Giang Doan

Abstract


Adversarial attacks pose serious challenges to the robustness of deep Convolutional Neural Networks (CNNs) in image classification. In this study, we evaluated the vulnerability of popular CNN modelsResNet50, ResNet101, AlexNet, MobileNetV2, DenseNet121, and InceptionNetV3-under white-box attacks, including FGSM, PGD, BIM, and C&W. Experiments are conducted on standard datasets such as MNIST, CIFAR-10, CIFAR-100, and ImageNet. To enhance model robustness, we propose three regularized adversarial training methods: ATWR (Adversarial Training with Weight Regularization), ATGR (Adversarial Training with Gradient Regularization), and EATR (Ensemble Adversarial Training with Regularization). Our results show that ATWR reduces the accuracy drop under the PGD attack on CIFAR-10 from 65.93% to 0.00%, and under C&W attack on MNIST from 100% to 0.31%. EATR achieves consistent robustness across all attacks and models, reducing the accuracy drop in CIFAR-10 (PGD) from 65.93% to 0%, while maintaining the classification accuracy within 10% of the original. ATGR, while reducing classification accuracy, enhances adversarial detection by amplifying the difference in output behavior under attack. The proposed methods strike varying trade-offs between robustness, generalization, and detectability. These findings offer practical guidance for securing deep CNNs against strong white-box adversarial threats. The source codes are available at: https://github.com/AdversarialAttack/DefenseAndAttack.


Full Text:

PDF

References


J. Weng, Z. Luo, D. Lin, and S. Li, “Comparative evaluation of recent universal adversarial perturbations in image classification”, Computers & Security, vol. 136, p. 103 576, 2024.

J. Sen, A. Sen, and A. Chatterjee, “Adversarial attacks on image classification models: Analysis and defense”, arXiv preprint arXiv:2312.16880, 2023.

N. Ghaffari Laleh, D. Truhn, G. P. Veldhuizen, et al., “Adversarial attacks and adversarial robustness in computational pathology”, Nature communications, vol. 13, no. 1, p. 5711, 2022.

H. Hirano, A. Minagi, and K. Takemoto, “Universal adversarial attacks on deep neural networks for medical image classification”, BMC medical imaging, vol. 21, pp. 1–13, 2021.

J. Su, D. V. Vargas, and K. Sakurai, “One pixel attack for fooling deep neural networks”, IEEE Transactions on Evolutionary Computation, vol. 23, no. 5, pp. 828–841, 2019.

A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks”, arXiv preprint arXiv:1706.06083, 2017.

S.-M. Moosavi-Dezfooli, A. Fawzi, and P. Frossard, “Deepfool: A simple and accurate method to fool deep neural networks”, in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2574–2582.

N. Carlini and D. Wagner, “Towards evaluating the robustness of neural networks”, in 2017 ieee symposium on security and privacy (sp), Ieee, 2017, pp. 39–57.

J. Chen, M. I. Jordan, and M. J. Wainwright, “Hopskipjumpattack: A query-efficient decisionbased attack”, in 2020 ieee symposium on security and privacy (sp), IEEE, 2020, pp. 1277–1294.

M. Andriushchenko, F. Croce, N. Flammarion, and M. Hein, “Square attack: A queryefficient black-box adversarial attack via random search”, in European conference on computer vision, Springer, 2020, pp. 484–501.

I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples”, arXiv preprint arXiv:1412.6572, 2014.

A. Mkadry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks”, stat, vol. 1050, no. 9, 2017.

A. Kurakin, I. J. Goodfellow, and S. Bengio, “Adversarial examples in the physical world”, in Artificial intelligence safety and security, Chapman and Hall/CRC, 2018, pp. 99–112.

H. Zhang and J. Wang, “Defense against adversarial attacks using feature scattering-based adversarial training”, Advances in neural information processing systems, vol. 32, 2019.

E.-C. Chen and C.-R. Lee, “Towards fast and robust adversarial training for image classification”, in Proceedings of the Asian Conference on Computer Vision, 2020.

Y. Jiang, C. Liu, Z. Huang, M. Salzmann, and S. Susstrunk, “Towards stable and efficient adversarial training against l 1 bounded adversarial attacks”, in International Conference on Machine Learning, PMLR, 2023, pp. 15 089–15 104.

N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami, “The limitations of deep learning in adversarial settings”, in 2016 IEEE European symposium on security and privacy (EuroS&P), IEEE, 2016, pp. 372–387.

D. Wang, A. Ju, E. Shelhamer, D. Wagner, and T. Darrell, “Fighting gradients with gradients: Dynamic defenses against adversarial attacks”, arXiv preprint arXiv:2105.08714, 2021.

N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami, “Distillation as a defense to adversarial perturbations against deep neural networks”, in 2016 IEEE symposium on security and privacy (SP), IEEE, 2016, pp. 582–597.

J. Sen, “The fgsm attack on image classification models and distillation as its defense”, in International Conference on Advances in Distributed Computing and Machine Learning, Springer, 2024, pp. 347–360.

A. Shafahi, M. Najibi, M. A. Ghiasi, et al., “Adversarial training for free!”, Advances in neural information processing systems, vol. 32, 2019.

T. Miyato, A. M. Dai, and I. Goodfellow, “Adversarial training methods for semisupervised text classification”, arXiv preprint arXiv:1605.07725, 2016.

H. Zhang, H. Chen, Z. Song, D. Boning, I. S. Dhillon, and C.-J. Hsieh, “The limitations of adversarial training and the blind-spot attack”, arXiv preprint arXiv:1901.04684, 2019.

A. Shafahi, P. Saadatpanah, C. Zhu, et al., “Adversarially robust transfer learning”, arXiv preprint arXiv:1905.08232, 2019.

A. Athalye, N. Carlini, and D. Wagner, “Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples”, in International conference on machine learning, PMLR, 2018, pp. 274–283.

S. Liu and P.-Y. Chen, “Zeroth-order optimization and its application to adversarial machine learning”, Intelligent Informatics, p. 25, 2018.

A. Ilyas, S. Santurkar, D. Tsipras, L. Engstrom, B. Tran, and A. Madry, “Adversarial examples are not bugs, they are features”, Advances in neural information processing systems, vol. 32, 2019.

P.-Y. Chen, H. Zhang, Y. Sharma, J. Yi, and C. J. Hsieh, “Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models”, in Proceedings of the 10th ACM workshop on artificial intelligence and security, 2017, pp. 15–26.

Y. Shi, Y. Han, Q. Hu, Y. Yang, and Q. Tian, “Query-efficient black-box adversarial attack with customized iteration and sampling”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 2, pp. 2226–2245, 2022.

N. Papernot, P. McDaniel, and I. Goodfellow, “Transferability in machine learning: From phenomena to black-box attacks using adversarial samples”, arXiv preprint arXiv:1605.07277, 2016.

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition”, in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.

F. Tramer, A. Kurakin, N. Papernot, I. Goodfellow, D. Boneh, and P. McDaniel, “Ensemble adversarial training: Attacks and defenses”, arXiv preprint arXiv:1705.07204, 2017.

Y. LeCun, C. Cortes, and C. J. Burges, “Gradient-based learning applied to document recognition”, in Proceedings of the IEEE, vol. 86, IEEE, 1998, pp. 2278–2324.

A. Krizhevsky, “Learning multiple layers of features from tiny images”, in Master’s thesis, University of Toronto, Toronto, Canada, 2009.

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database”, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2009, pp. 248–255.

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition”, in The IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778. doi: 10 . 1109/CVPR.2016.90.

F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, Squeezenet: Alexnet-level accuracy with 50x fewer parameters and ¡0.5mb model size, 2016. arXiv: 1602.07360 [cs.CV].

M. Sandler, A. G. Howard, M. Zhu, A. Zhmoginov, and L. Chen, “Inverted residuals and linear bottlenecks: Mobile networks for classification, detection and segmentation”, CoRR, vol. abs/1801.04381, 2018.

C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision”, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2818–2826, 2016. doi: 10. 1109/CVPR.2016.308.




DOI: https://doi.org/10.31449/inf.v49i26.8572

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.