Regularized Adversarial Training for Robust CNN Image Classification: Evaluation of ATWR, ATGR, and EATR Under White-Box Attacks
Abstract
Adversarial attacks pose serious challenges to the robustness of deep Convolutional Neural Networks (CNNs) in image classification. In this study, we evaluated the vulnerability of popular CNN modelsResNet50, ResNet101, AlexNet, MobileNetV2, DenseNet121, and InceptionNetV3-under white-box attacks, including FGSM, PGD, BIM, and C&W. Experiments are conducted on standard datasets such as MNIST, CIFAR-10, CIFAR-100, and ImageNet. To enhance model robustness, we propose three regularized adversarial training methods: ATWR (Adversarial Training with Weight Regularization), ATGR (Adversarial Training with Gradient Regularization), and EATR (Ensemble Adversarial Training with Regularization). Our results show that ATWR reduces the accuracy drop under the PGD attack on CIFAR-10 from 65.93% to 0.00%, and under C&W attack on MNIST from 100% to 0.31%. EATR achieves consistent robustness across all attacks and models, reducing the accuracy drop in CIFAR-10 (PGD) from 65.93% to 0%, while maintaining the classification accuracy within 10% of the original. ATGR, while reducing classification accuracy, enhances adversarial detection by amplifying the difference in output behavior under attack. The proposed methods strike varying trade-offs between robustness, generalization, and detectability. These findings offer practical guidance for securing deep CNNs against strong white-box adversarial threats. The source codes are available at: https://github.com/AdversarialAttack/DefenseAndAttack.References
J. Weng, Z. Luo, D. Lin, and S. Li, “Comparative evaluation of recent universal adversarial perturbations in image classification”, Computers & Security, vol. 136, p. 103 576, 2024.
J. Sen, A. Sen, and A. Chatterjee, “Adversarial attacks on image classification models: Analysis and defense”, arXiv preprint arXiv:2312.16880, 2023.
N. Ghaffari Laleh, D. Truhn, G. P. Veldhuizen, et al., “Adversarial attacks and adversarial robustness in computational pathology”, Nature communications, vol. 13, no. 1, p. 5711, 2022.
H. Hirano, A. Minagi, and K. Takemoto, “Universal adversarial attacks on deep neural networks for medical image classification”, BMC medical imaging, vol. 21, pp. 1–13, 2021.
J. Su, D. V. Vargas, and K. Sakurai, “One pixel attack for fooling deep neural networks”, IEEE Transactions on Evolutionary Computation, vol. 23, no. 5, pp. 828–841, 2019.
A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks”, arXiv preprint arXiv:1706.06083, 2017.
S.-M. Moosavi-Dezfooli, A. Fawzi, and P. Frossard, “Deepfool: A simple and accurate method to fool deep neural networks”, in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2574–2582.
N. Carlini and D. Wagner, “Towards evaluating the robustness of neural networks”, in 2017 ieee symposium on security and privacy (sp), Ieee, 2017, pp. 39–57.
J. Chen, M. I. Jordan, and M. J. Wainwright, “Hopskipjumpattack: A query-efficient decisionbased attack”, in 2020 ieee symposium on security and privacy (sp), IEEE, 2020, pp. 1277–1294.
M. Andriushchenko, F. Croce, N. Flammarion, and M. Hein, “Square attack: A queryefficient black-box adversarial attack via random search”, in European conference on computer vision, Springer, 2020, pp. 484–501.
I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples”, arXiv preprint arXiv:1412.6572, 2014.
A. Mkadry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks”, stat, vol. 1050, no. 9, 2017.
A. Kurakin, I. J. Goodfellow, and S. Bengio, “Adversarial examples in the physical world”, in Artificial intelligence safety and security, Chapman and Hall/CRC, 2018, pp. 99–112.
H. Zhang and J. Wang, “Defense against adversarial attacks using feature scattering-based adversarial training”, Advances in neural information processing systems, vol. 32, 2019.
E.-C. Chen and C.-R. Lee, “Towards fast and robust adversarial training for image classification”, in Proceedings of the Asian Conference on Computer Vision, 2020.
Y. Jiang, C. Liu, Z. Huang, M. Salzmann, and S. Susstrunk, “Towards stable and efficient adversarial training against l 1 bounded adversarial attacks”, in International Conference on Machine Learning, PMLR, 2023, pp. 15 089–15 104.
N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami, “The limitations of deep learning in adversarial settings”, in 2016 IEEE European symposium on security and privacy (EuroS&P), IEEE, 2016, pp. 372–387.
D. Wang, A. Ju, E. Shelhamer, D. Wagner, and T. Darrell, “Fighting gradients with gradients: Dynamic defenses against adversarial attacks”, arXiv preprint arXiv:2105.08714, 2021.
N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami, “Distillation as a defense to adversarial perturbations against deep neural networks”, in 2016 IEEE symposium on security and privacy (SP), IEEE, 2016, pp. 582–597.
J. Sen, “The fgsm attack on image classification models and distillation as its defense”, in International Conference on Advances in Distributed Computing and Machine Learning, Springer, 2024, pp. 347–360.
A. Shafahi, M. Najibi, M. A. Ghiasi, et al., “Adversarial training for free!”, Advances in neural information processing systems, vol. 32, 2019.
T. Miyato, A. M. Dai, and I. Goodfellow, “Adversarial training methods for semisupervised text classification”, arXiv preprint arXiv:1605.07725, 2016.
H. Zhang, H. Chen, Z. Song, D. Boning, I. S. Dhillon, and C.-J. Hsieh, “The limitations of adversarial training and the blind-spot attack”, arXiv preprint arXiv:1901.04684, 2019.
A. Shafahi, P. Saadatpanah, C. Zhu, et al., “Adversarially robust transfer learning”, arXiv preprint arXiv:1905.08232, 2019.
A. Athalye, N. Carlini, and D. Wagner, “Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples”, in International conference on machine learning, PMLR, 2018, pp. 274–283.
S. Liu and P.-Y. Chen, “Zeroth-order optimization and its application to adversarial machine learning”, Intelligent Informatics, p. 25, 2018.
A. Ilyas, S. Santurkar, D. Tsipras, L. Engstrom, B. Tran, and A. Madry, “Adversarial examples are not bugs, they are features”, Advances in neural information processing systems, vol. 32, 2019.
P.-Y. Chen, H. Zhang, Y. Sharma, J. Yi, and C. J. Hsieh, “Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models”, in Proceedings of the 10th ACM workshop on artificial intelligence and security, 2017, pp. 15–26.
Y. Shi, Y. Han, Q. Hu, Y. Yang, and Q. Tian, “Query-efficient black-box adversarial attack with customized iteration and sampling”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 2, pp. 2226–2245, 2022.
N. Papernot, P. McDaniel, and I. Goodfellow, “Transferability in machine learning: From phenomena to black-box attacks using adversarial samples”, arXiv preprint arXiv:1605.07277, 2016.
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition”, in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
F. Tramer, A. Kurakin, N. Papernot, I. Goodfellow, D. Boneh, and P. McDaniel, “Ensemble adversarial training: Attacks and defenses”, arXiv preprint arXiv:1705.07204, 2017.
Y. LeCun, C. Cortes, and C. J. Burges, “Gradient-based learning applied to document recognition”, in Proceedings of the IEEE, vol. 86, IEEE, 1998, pp. 2278–2324.
A. Krizhevsky, “Learning multiple layers of features from tiny images”, in Master’s thesis, University of Toronto, Toronto, Canada, 2009.
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database”, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2009, pp. 248–255.
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition”, in The IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778. doi: 10 . 1109/CVPR.2016.90.
F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, Squeezenet: Alexnet-level accuracy with 50x fewer parameters and ¡0.5mb model size, 2016. arXiv: 1602.07360 [cs.CV].
M. Sandler, A. G. Howard, M. Zhu, A. Zhmoginov, and L. Chen, “Inverted residuals and linear bottlenecks: Mobile networks for classification, detection and segmentation”, CoRR, vol. abs/1801.04381, 2018.
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision”, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2818–2826, 2016. doi: 10. 1109/CVPR.2016.308.
DOI:
https://doi.org/10.31449/inf.v49i26.8572Downloads
Additional Files
Published
How to Cite
Issue
Section
License
I assign to Informatica, An International Journal of Computing and Informatics ("Journal") the copyright in the manuscript identified above and any additional material (figures, tables, illustrations, software or other information intended for publication) submitted as part of or as a supplement to the manuscript ("Paper") in all forms and media throughout the world, in all languages, for the full term of copyright, effective when and if the article is accepted for publication. This transfer includes the right to reproduce and/or to distribute the Paper to other journals or digital libraries in electronic and online forms and systems.
I understand that I retain the rights to use the pre-prints, off-prints, accepted manuscript and published journal Paper for personal use, scholarly purposes and internal institutional use.
In certain cases, I can ask for retaining the publishing rights of the Paper. The Journal can permit or deny the request for publishing rights, to which I fully agree.
I declare that the submitted Paper is original, has been written by the stated authors and has not been published elsewhere nor is currently being considered for publication by any other journal and will not be submitted for such review while under review by this Journal. The Paper contains no material that violates proprietary rights of any other person or entity. I have obtained written permission from copyright owners for any excerpts from copyrighted works that are included and have credited the sources in my article. I have informed the co-author(s) of the terms of this publishing agreement.
Copyright © Slovenian Society Informatika







