Hyperparameter Optimization for Malicious URL Detection: Leveraging Optuna and Random Search in Machine Learning and Deep Learning Models
Abstract
Uniform Resource Locators (URLs) are critical indicators for identifying malicious online activities such as malware distribution, phishing attacks, and website defacement. This research presents a robust approach for detecting these threats using both deep learning (DL) and machine learning (ML) techniques. We emphasize hyperparameter optimization, employing Optuna—a Bayesian optimization framework— and Random Search to systematically enhance model performance. Unlike many prior studies, which often overlook thorough hyperparameter tuning, our approach demonstrates improvements over state of the art methods. Our Bidirectional Encoder Representations from Transformers (BERT) model achieved an accuracy of 98.84%, with an F1 score of 99.02%, while the Light Gradient Boosting Machine (LightGBM) attained an accuracy of 98.46% and an F1 score of 98.45%References
A. Basit, M. Zafar, X. Liu, A. R. Javed, Z. Jalil, and K. Kifayat, “A comprehensive survey of ai-enabled phishing attacks detection techniques,” Springer Science, 2020. DOI: https : / / doi . org / 10 .1007/s11235-020-00733-2.
Phishing activity trends reports, https://apwg.org/trendsreports, 2024.
Fbi releases internet crime report, https : / /www . fbi . gov / contact - us / field -offices / sanfrancisco / news / fbi -releases-internet-crime-report, April
, 2024.
Cryptocurrency exchange says it was victim of 1.5billionhack, https : / /
apnews . com / article / bybit -
exchange - crypto - hack -88256366c723a9de8327ef3d4071057e,
February 22, 2025.
S. Sheikhi, “An effective fake news detection method using woa-xgbtree algorithm and contentbased features,” Elsevier, 2021. DOI: https://doi.org/10.1016/j.asoc.2021.107559.
S. Sheikhi and P. Kostakos, “Safeguarding cyberspace:Enhancing malicious website detection with pso–optimized xgboost and firefly-based feature
selection,” Computers Security, 2024. DOI:10.1016/j.cose.2024.103885.
M.-Y. Su and K.-L. Su, “Bert-based approaches to identifying malicious urls,” Sensors, vol. 23, p. 8499, 2023. DOI: 10.3390/s23208499. [Online]. Available: https://www.mdpi.com/journal/sensors.
M. Nanda, M. Saraswat, and P. K. Sharma, “Enhancing cybersecurity: A review and comparative analysis of convolutional neural network approaches
for detecting url-based phishing attacks,” Elsevier eprime, April 2024. DOI:10.1016/j.prime.2024 . 100533. [Online]. Available: https :/ / www . sciencedirect . com / science /article/pii/S2772671124001153.
Z. Wang, X. Ren, S. Li, B. Wang, J. Zhang, and T. Yang, “A malicious url detection model based on convolutional neural network,” Hindawi Security
and Communication Networks, vol. 2021, p. 12, 2021. DOI: 10.1155/2021/5518528.
B. B. Gupta, K. Yadav, I. Razzak, K. Psannis, A. Castiglione, and X. Chang, “A novel approach for phishing urls detection using lexical-based machine learning in a real-time environment,” Computer Communications, 2021. DOI:10.1016/j.comcom.2021.04.023.
F. Ullah, A. Alsirhani, M. M. Alshahrani, A. Alomari, H. Naeem, and S. A. Shah, “Explainable malware detection system using transformers-based transfer learning and multi-model visual representation,” Sensors, 2022. DOI: https://doi.org/10.3390/s22186766.
T. Wu, M. Wang, Y. Xi, and Z. Zhao, “Malicious url detection model based on bidirectional gated recurrent unit and attention mechanism,” applied Science MDPI, December 2022. DOI: 10 . 3390 /app122312367. [Online]. Available: https://www.mdpi.com/2076-3417/12/23/12367.
Z. Alshingiti, R. Alaqel, J. AlMuhtadi, Q. E. U. Haq, K. Saleem, and M. H. Faheem, “A deep learningbased phishing detection system using cnn,lstm, and lstm-cnn,” electronics MDPI, January 2023. DOI:10.3390/electronics12010232. [Online]. Available: https://www.mdpi.com/2079-9292/12/1/232.
R. Indu, M. Bhavya, V. Pardhasaradhi, Y. S. Ram, and Y. Suresh, “Malicious url detection,” International Journal of Creative Research Thoughts
(IJCRT), Apr. 2023, ISSN: 2320-2882. [Online]. Available: https :/ /ijcrt .org / papers /IJCRT2304563.pdf.
M. Alsaedi, F. A. Ghaleb, F. Saeed, J. Ahmad, and M. Alasli, “Cyber threat intelligence-based malicious url detection model using ensemble learning,”
Sensors, 2022. DOI: 10.3390/s22093373.
B. Yu, F. Tang, D. Ergu, R. Zeng, B. Ma, and F. Liu, “Efficient classification of malicious urls: M-bert—a modified bert variant for enhanced semantic understanding,” IEEE Access, 2024. DOI: 10 . 1109 /ACCESS . 2024 . 3357095. [Online]. Available: https : / / www . researchgate . net /
publication / 377618560 _ Efficient _Classification_of_Malicious_URLs_M - BERT_ - _A _ Modified _ BERT _Variant _ for _ Enhanced _ Semantic _
Understanding.
P. Maneriker, J. W. Stokes, E. G. Lazo, D. Carutasu, F. Tajaddodianfar, and A. Gururajan, “Urltran: Improving phishing url detection using transformers,” arXiv, 2021. DOI: https://arxiv.org/abs/2106.05256.
Malicious urls dataset, https://www.kaggle.com/datasets/sid321axn/maliciousurls-dataset, Accessed: 26 August 2023.
H. M. J. Khan, “A machine learning based web service for malicious url detection in a browser,” M.S. thesis, Electrical and Computer Engineering Department Hammond, Indiana, 2019.
K. Clark, U. Khandelwal, O. Levy, and C. D. Manning, “What does bert look at? an analysis of bert’s attention,” arXiv preprint arXiv:1906.04341, 2019. [Online]. Available: https : / / arxiv . org /abs/1906.04341.
G. Ke, Q. Meng, T. Finley, et al., “Lightgbm: A highly efficient gradient boosting decision tree,” in Advances in Neural Information Processing Systems, vol. 30, 2017. [Online]. Available:https : / / papers . neurips . cc /
paper _ files / paper / 2017 / hash /6449f44a102fde848669bdd9eb6b76fa -
Abstract.html.
J.-P. Lai, Y.-L. Lin, H.-C. Lin, C.-Y. Shih, Y.-P. Wang, and P.-F. Pai, “Tree-based machine learning models with optuna in predicting impedance values for circuit analysis,” Micromachines, vol. 14, no. 2, 2023, ISSN: 2072-666X. DOI: 10 . 3390 /mi14020265. [Online]. Available: https : / /www.mdpi.com/2072-666X/14/2/265.
S. Hanifi, A. Cammarono, and H. Zare-Behtash, “Advanced hyperparameter optimization of deep learning models for wind power prediction,” Renewable Energy, vol. 221, p. 119 700, 2024,
ISSN: 0960-1481. DOI: https : / / doi .org / 10 . 1016 / j . renene . 2023 .
[Online]. Available: https : / / www.sciencedirect.com/science/article/
pii/S0960148123016154.
T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, “Optuna: A next-generation hyperparameter optimization framework,” CoRR,
vol. abs/1907.10902, 2019. DOI: https : / /doi.org/10.4855/arXiv.1907.10902.
arXiv: 1907 . 10902. [Online]. Available:http://arxiv.org/abs/1907.10902.
D. A. Anggoro and S. S. Mukti, “Performance comparison of grid search and random search methods for hyperparameter tuning in extreme gradient boosting algorithm to predict chronic kidney failure.,”International Journal of Intelligent Engineering & Systems, vol. 14, no. 6, 2021. DOI:10.22266/ijies2021.1231.19. [Online]. Available: https://inass.org/wp-content/uploads/2021/10/2021123119.pdf.
DOI:
https://doi.org/10.31449/inf.v49i27.9106Downloads
Published
How to Cite
Issue
Section
License
I assign to Informatica, An International Journal of Computing and Informatics ("Journal") the copyright in the manuscript identified above and any additional material (figures, tables, illustrations, software or other information intended for publication) submitted as part of or as a supplement to the manuscript ("Paper") in all forms and media throughout the world, in all languages, for the full term of copyright, effective when and if the article is accepted for publication. This transfer includes the right to reproduce and/or to distribute the Paper to other journals or digital libraries in electronic and online forms and systems.
I understand that I retain the rights to use the pre-prints, off-prints, accepted manuscript and published journal Paper for personal use, scholarly purposes and internal institutional use.
In certain cases, I can ask for retaining the publishing rights of the Paper. The Journal can permit or deny the request for publishing rights, to which I fully agree.
I declare that the submitted Paper is original, has been written by the stated authors and has not been published elsewhere nor is currently being considered for publication by any other journal and will not be submitted for such review while under review by this Journal. The Paper contains no material that violates proprietary rights of any other person or entity. I have obtained written permission from copyright owners for any excerpts from copyrighted works that are included and have credited the sources in my article. I have informed the co-author(s) of the terms of this publishing agreement.
Copyright © Slovenian Society Informatika







