AI-Driven Cybersecurity: Enhancing Malware Detection through Machine Learning
Abstract
In a digital landscape where malicious software evolves faster than traditional defenses can adapt, the ability to detect threats intelligently and proactively has become essential. This study presents a structured machine learning approach to malware detection using static analysis of malware and benign executable samples. By framing the task as a supervised classification problem, the methodology explores how model performance is shaped by variations in training size, feature selection strategy, and class distribution. Four classification algorithms—a backpropagation neural network, decision tree, random forest, and support vector machine—were evaluated through a rigorous pipeline that includes data preprocessing, transformation, feature selection, and performance assessment. Special attention is given to how class imbalance influences learning dynamics and misclassification patterns. Rather than focusing solely on accuracy, the study leverages confusion matrices to examine the qualitative behavior of each model. The resulting contrasts between balanced and unbalanced scenarios expose critical considerations for deploying scalable and interpretable malware detection systems in real-world cybersecurity contexts.References
National Institute of Standards and Technology,
“Malware definition.” https://csrc.nist.gov/
glossary/term/malware, 2023. Accessed: 2025-
-17.
Symantec, “Internet security threat report
” https://symantec-enterprise-blogs.
security.com, 2023. Accessed: 2025-05-17.
R. Sihwail, K. Omar, and K. A. Z. Ariffin, “A
survey on malware analysis techniques: Static,
dynamic, hybrid and memory analysis,” Inter-
national Journal on Advanced Science, Engineer-
ing and Information Technology, vol. 8, no. 4-2,
pp. 1662–1671, 2018.
U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth,
“The kdd process for extracting useful knowledge
from volumes of data,” Communications of the
ACM, vol. 39, no. 11, pp. 27–34, 1996.
M. Yeasin, “Active learning model used for an-
droid malware detection,” SSRN, 2023.
M. Shakib, “Android malware detection ap-
proach’s based on genetic ai, cnn, rnn, lstm, gru,
and active learning,” CNN, RNN, LSTM, GRU,
and Active Learning.
R. Castro, C. Schmitt, and G. Rodosek, “Armed:
How automatic malware modifications can evade
static detection?,” pp. 20–27, 03 2019.
H. C. Tanuwidjaja and K.-j. Kim, “Enhancing
malware detection by modified deep abstraction
and weighted feature selection,” in Proceedings of
the 2020 Symposium on Cryptography and Infor-
mation Security, Seoul, Republic of Korea, pp. 2–
, 2020.
D. Sharma, “Malware detection using machine
learning,” tech. rep., Department of Computer
Science Engineering and Information Technol-
ogy, Jaypee University of Information Technology,
A. Kamboj, P. Kumar, A. K. Bairwa, and
S. Joshi, “Detection of malware in downloaded
files using various machine learning models,”
Egyptian Informatics Journal, vol. 24, no. 1,
pp. 81–94, 2023.
K. Liu, S. Xu, G. Xu, M. Zhang, D. Sun, and
H. Liu, “A review of android malware detection
approaches based on machine learning,” IEEE ac-
cess, vol. 8, pp. 124579–124607, 2020.
R. K. Roy, “A few approaches in encrypted mal-
ware classifications,” North American Academic
Research, 2022.
R. Baker del Aguila, C. D. Contreras P´erez,
A. G. Silva-Trujillo, J. C. Cuevas-Tello, and
J. Nunez-Varela, “Static malware analysis using
low-parameter machine learning models,” Com-
puters, vol. 13, no. 3, p. 59, 2024.
A. M. Sharifnia, D. E. Kpormegbey, D. K.
Thapa, and M. Cleary, “A primer of data cleaning
in quantitative research: Handling missing val-
ues and outliers,” Journal of Advanced Nursing,
vol. 0, pp. 1–6.
M. Carvalho, A. J. Pinho, and S. Br´as, “Resam-
pling approaches to handle class imbalance: a re-
view from a data perspective,” Journal of Big
Data, vol. 12, no. 1, p. 71, 2025.
R. E. Bellman, Dynamic Programming. Prince-
ton, NJ: Princeton University Press, 1957.
D. Peng, Z. Gui, and H. Wu, “Interpreting
the curse of dimensionality from distance con-
centration and manifold effect,” arXiv preprint
arXiv:2401.00422, 2023.
F. Pedregosa, G. Varoquaux, A. Gramfort,
V. Michel, B. Thirion, O. Grisel, M. Blon-
del, P. Prettenhofer, R. Weiss, V. Dubourg,
J. Vanderplas, A. Passos, D. Cournapeau,
M. Brucher, M. Perrot, and E. Duchesnay,
“Scikit-learn: Machine learning in Python.”
https://scikit-learn.org/stable/, 2011.
Accessed: 2025-05-20.
A. Souri and R. Hosseini, “A state-of-the-art sur-
vey of malware detection approaches using data
mining techniques,” Human-centric Computing
and Information Sciences, vol. 8, no. 1, p. 3, 2018.
Y. LeCun, Y. Bengio, and G. Hinton, “Deep
learning,” nature, vol. 521, no. 7553, pp. 436–444,
A. Kinasih, A. Handayani, J. Ardiansah, and
N. Damanhuri, “Comparative analysis of decision
tree and random forest classifiers for structured
data classification in machine learning,” Science
in Information Technology Letters, vol. 5, pp. 13–
, 11 2024.
T. Admassu, A. Salau, G. Chhabra, K. Kaushik,
and S. Braide, “Evaluation of random forest and
support vector machine models in educational
data mining,” 06 2024.
L. Rokach and O. Maimon, Decision Trees,
pp. 165–192. Boston, MA: Springer US, 2005.
IBM, “What is random forest?.” https:
//www.ibm.com/think/topics/random-forest,
Dec. 2023. Accessed: 2025-05-24.
N. Cristianini and E. Ricci, “Support vector ma-
chines,” in Encyclopedia of algorithms, pp. 928–
, Springer, 2008.
A. Geron, Hands-On Machine Learning with
Scikit-Learn, Keras, and TensorFlow: Concepts,
Tools, and Techniques to Build Intelligent Sys-
tems. O’Reilly Media, Inc., 2nd ed., 2019.
DOI:
https://doi.org/10.31449/inf.v49i37.10728Downloads
Published
How to Cite
Issue
Section
License
I assign to Informatica, An International Journal of Computing and Informatics ("Journal") the copyright in the manuscript identified above and any additional material (figures, tables, illustrations, software or other information intended for publication) submitted as part of or as a supplement to the manuscript ("Paper") in all forms and media throughout the world, in all languages, for the full term of copyright, effective when and if the article is accepted for publication. This transfer includes the right to reproduce and/or to distribute the Paper to other journals or digital libraries in electronic and online forms and systems.
I understand that I retain the rights to use the pre-prints, off-prints, accepted manuscript and published journal Paper for personal use, scholarly purposes and internal institutional use.
In certain cases, I can ask for retaining the publishing rights of the Paper. The Journal can permit or deny the request for publishing rights, to which I fully agree.
I declare that the submitted Paper is original, has been written by the stated authors and has not been published elsewhere nor is currently being considered for publication by any other journal and will not be submitted for such review while under review by this Journal. The Paper contains no material that violates proprietary rights of any other person or entity. I have obtained written permission from copyright owners for any excerpts from copyrighted works that are included and have credited the sources in my article. I have informed the co-author(s) of the terms of this publishing agreement.
Copyright © Slovenian Society Informatika







