AI-Driven Cybersecurity: Enhancing Malware Detection through Machine Learning

Abstract

In a digital landscape where malicious software evolves faster than traditional defenses can adapt, the ability to detect threats intelligently and proactively has become essential. This study presents a structured machine learning approach to malware detection using static analysis of malware and benign executable samples. By framing the task as a supervised classification problem, the methodology explores how model performance is shaped by variations in training size, feature selection strategy, and class distribution. Four classification algorithms—a backpropagation neural network, decision tree, random forest, and support vector machine—were evaluated through a rigorous pipeline that includes data preprocessing, transformation, feature selection, and performance assessment. Special attention is given to how class imbalance influences learning dynamics and misclassification patterns. Rather than focusing solely on accuracy, the study leverages confusion matrices to examine the qualitative behavior of each model. The resulting contrasts between balanced and unbalanced scenarios expose critical considerations for deploying scalable and interpretable malware detection systems in real-world cybersecurity contexts.

References

National Institute of Standards and Technology,

“Malware definition.” https://csrc.nist.gov/

glossary/term/malware, 2023. Accessed: 2025-

-17.

Symantec, “Internet security threat report

” https://symantec-enterprise-blogs.

security.com, 2023. Accessed: 2025-05-17.

R. Sihwail, K. Omar, and K. A. Z. Ariffin, “A

survey on malware analysis techniques: Static,

dynamic, hybrid and memory analysis,” Inter-

national Journal on Advanced Science, Engineer-

ing and Information Technology, vol. 8, no. 4-2,

pp. 1662–1671, 2018.

U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth,

“The kdd process for extracting useful knowledge

from volumes of data,” Communications of the

ACM, vol. 39, no. 11, pp. 27–34, 1996.

M. Yeasin, “Active learning model used for an-

droid malware detection,” SSRN, 2023.

M. Shakib, “Android malware detection ap-

proach’s based on genetic ai, cnn, rnn, lstm, gru,

and active learning,” CNN, RNN, LSTM, GRU,

and Active Learning.

R. Castro, C. Schmitt, and G. Rodosek, “Armed:

How automatic malware modifications can evade

static detection?,” pp. 20–27, 03 2019.

H. C. Tanuwidjaja and K.-j. Kim, “Enhancing

malware detection by modified deep abstraction

and weighted feature selection,” in Proceedings of

the 2020 Symposium on Cryptography and Infor-

mation Security, Seoul, Republic of Korea, pp. 2–

, 2020.

D. Sharma, “Malware detection using machine

learning,” tech. rep., Department of Computer

Science Engineering and Information Technol-

ogy, Jaypee University of Information Technology,

A. Kamboj, P. Kumar, A. K. Bairwa, and

S. Joshi, “Detection of malware in downloaded

files using various machine learning models,”

Egyptian Informatics Journal, vol. 24, no. 1,

pp. 81–94, 2023.

K. Liu, S. Xu, G. Xu, M. Zhang, D. Sun, and

H. Liu, “A review of android malware detection

approaches based on machine learning,” IEEE ac-

cess, vol. 8, pp. 124579–124607, 2020.

R. K. Roy, “A few approaches in encrypted mal-

ware classifications,” North American Academic

Research, 2022.

R. Baker del Aguila, C. D. Contreras P´erez,

A. G. Silva-Trujillo, J. C. Cuevas-Tello, and

J. Nunez-Varela, “Static malware analysis using

low-parameter machine learning models,” Com-

puters, vol. 13, no. 3, p. 59, 2024.

A. M. Sharifnia, D. E. Kpormegbey, D. K.

Thapa, and M. Cleary, “A primer of data cleaning

in quantitative research: Handling missing val-

ues and outliers,” Journal of Advanced Nursing,

vol. 0, pp. 1–6.

M. Carvalho, A. J. Pinho, and S. Br´as, “Resam-

pling approaches to handle class imbalance: a re-

view from a data perspective,” Journal of Big

Data, vol. 12, no. 1, p. 71, 2025.

R. E. Bellman, Dynamic Programming. Prince-

ton, NJ: Princeton University Press, 1957.

D. Peng, Z. Gui, and H. Wu, “Interpreting

the curse of dimensionality from distance con-

centration and manifold effect,” arXiv preprint

arXiv:2401.00422, 2023.

F. Pedregosa, G. Varoquaux, A. Gramfort,

V. Michel, B. Thirion, O. Grisel, M. Blon-

del, P. Prettenhofer, R. Weiss, V. Dubourg,

J. Vanderplas, A. Passos, D. Cournapeau,

M. Brucher, M. Perrot, and E. Duchesnay,

“Scikit-learn: Machine learning in Python.”

https://scikit-learn.org/stable/, 2011.

Accessed: 2025-05-20.

A. Souri and R. Hosseini, “A state-of-the-art sur-

vey of malware detection approaches using data

mining techniques,” Human-centric Computing

and Information Sciences, vol. 8, no. 1, p. 3, 2018.

Y. LeCun, Y. Bengio, and G. Hinton, “Deep

learning,” nature, vol. 521, no. 7553, pp. 436–444,

A. Kinasih, A. Handayani, J. Ardiansah, and

N. Damanhuri, “Comparative analysis of decision

tree and random forest classifiers for structured

data classification in machine learning,” Science

in Information Technology Letters, vol. 5, pp. 13–

, 11 2024.

T. Admassu, A. Salau, G. Chhabra, K. Kaushik,

and S. Braide, “Evaluation of random forest and

support vector machine models in educational

data mining,” 06 2024.

L. Rokach and O. Maimon, Decision Trees,

pp. 165–192. Boston, MA: Springer US, 2005.

IBM, “What is random forest?.” https:

//www.ibm.com/think/topics/random-forest,

Dec. 2023. Accessed: 2025-05-24.

N. Cristianini and E. Ricci, “Support vector ma-

chines,” in Encyclopedia of algorithms, pp. 928–

, Springer, 2008.

A. Geron, Hands-On Machine Learning with

Scikit-Learn, Keras, and TensorFlow: Concepts,

Tools, and Techniques to Build Intelligent Sys-

tems. O’Reilly Media, Inc., 2nd ed., 2019.

Authors

  • Isai Moreno-Lara Facultad de Ingeniería, Universidad Autónoma de San Luis Potosí
  • Alejandra Silva-Trujillo Facultad de Ingeniería, Universidad Autónoma de San Luis Potosí
  • Juan C. Cuevas-Tello Facultad de Ingeniería, Universidad Autónoma de San Luis Potosí
  • Jose Nunez-Varela Facultad de Ingeniería, Universidad Autónoma de San Luis Potosí

DOI:

https://doi.org/10.31449/inf.v49i37.10728

Downloads

Published

12/25/2025

How to Cite

Moreno-Lara, I., Silva-Trujillo, A., Cuevas-Tello, J. C., & Nunez-Varela, J. (2025). AI-Driven Cybersecurity: Enhancing Malware Detection through Machine Learning. Informatica, 49(37). https://doi.org/10.31449/inf.v49i37.10728