Stacking and Voting-Based Boosting Ensembles for Robust Malicious URL Classification

Dharmaraj Rajaram Patil; Tareek M. Pattewar; Trupti S. Shinde; Kavita S. Kumavat; Sujit N. Deshpande

doi:10.31449/inf.v49i35.7762

Abstract

The rising prevalence of malicious URLs poses serious risks to cybersecurity, enabling phishing, malware delivery, and data theft. Conventional blacklist and heuristic-based detection methods struggle to identify emerging and obfuscated attacks. To address this gap, we present an ensemble learning framework that integrates stacking and voting strategies with multiple boosting algorithms for reliable malicious URL classification. The system employs six advanced learners—XGBoost, AdaBoost, Gradient Boosting, Light-GBM, CatBoost, and LogitBoost—whose outputs are combined through majority voting and a two-layer stacking scheme, where logistic regression is used as the meta-learner. Evaluation was carried out ona Kaggle dataset containing 1,043,311 URLs (817,986 benign and 225,325 malicious), using a stratified 70:30 train/test split to preserve class balance. The proposed ensembles surpassed individual boosting models and conventional ensembles in accuracy, precision, recall, F1-score, and AUC. Stacking achieved93.44% across all metrics, while voting achieved 93.25%. In addition to strong predictive performance, the approach shows low prediction latency and effective handling of imbalanced data, making it practical for large-scale, near real-time deployment. This work demonstrates that combining stacking and voting ensembles offers a robust defense against evolving malicious URL threats.

Author Biography

Dharmaraj Rajaram Patil, Department of Computer Engineering, R.C.Patel Institute of Technology, Shirpur, India

Dharmaraj R. Patil received his Master of Engineering in Computer Science and Engineering from the Government College of Engineering, Aurangabad, Maharashtra, India and PhD in Computer Engineering from the Kavayitri Bahinabai Chaudhari North Maharashtra University Jalgaon, Maharashtra, India. He is working as an Assistant Professor in the Computer Engineering Department at R.C. Patel Institute of Technology, Shirpur, Maharashtra, India. He has 18 years of teaching experience. His research interests are web security, intrusion detection and web mining. He has published many papers in international/national conferences and journals.

References

Anti-Phishing Working Group, ”APWG’s

Threat Report for Q4 2023,” Anti-Phishing

Working Group, 2023. [Online]. Avail-

able:

https://www.apwg.org/reports.

[Accessed: Dec. 2, 2024].

The SANS Institute, ”Cyber Threat Intel-

ligence (CTI) Survey 2024,” The SANS In-

stitute, 2024. [Online]. Available: https:

//www.sans.org. [Accessed: Dec. 2, 2024].

CrowdStrike, ”2024 Global Threat Re-

port,” CrowdStrike, 2024. [Online]. Avail-

able: https://www.crowdstrike.com. [Ac-

cessed: Dec. 2, 2024].

D. Sahoo, ”Malicious URL detection us-

ing machine learning: a survey,” ArXiv,

vol. 1701, 2019, version 3. [Online]. Avail-

able: https://arxiv.org/abs/1701. [Ac-

cessed: Dec. 2, 2024].

M. Aljabri, H. S. Altamimi, S. A. Albe-

lali, M. Al-Harbi, H. T. Alhuraib, N. K.

Alotaibi, A. A. Alahmadi, F. Alhaidari, R.

M. A. Mohammad, and K. Salah, ”Detect-

ing malicious URLs using machine learning

techniques: review and research directions,”

IEEE Access, vol. 10, pp. 121395–121417,

, doi: 10.1109/ACCESS.2022.3225741.

Ç. Catal, G. Giray, B. Tekinerdogan, S. Ku-

mar, and S. Shukla, ”Applications of deep

learning for phishing detection: a systematic

literature review,” Knowledge and Informa-

tion Systems, vol. 64, no. 6, pp. 1457–1500,

, doi: 10.1007/s10115-022-01693-3.

F. Carroll, J. A. Adejobi, and R. Montasari,

”How good are we at detecting a phishing

attack? Investigating the evolving phish-

ing attack email and why it continues to

successfully deceive society,” SN Computer

Science, vol. 3, no. 2, p. 170, 2022, doi:

1007/s42979-022-01003-0.

Q. Abu Al-Haija and M. Al-Fayoumi, ”An in-

telligent identification and classification sys-

tem for malicious uniform resource locators (URLs),” Neural Computing and Applica-

tions, vol. 35, no. 23, pp. 16995–17011, 2023.

N. Reyes-Dorta, P. Caballero-Gil, and C.

Rosa-Remedios, ”Detection of malicious

URLs using machine learning,” Wireless Net-

works, 2024, pp. 1–18.

Das Guptta, Sumitra, Khandaker Tayef

Shahriar, Hamed Alqahtani, Dheyaaldin Al-

salman, and Iqbal H. Sarker, ”Modeling hy-

brid feature-based phishing websites detec-

tion using machine learning techniques,” An-

nals of Data Science, vol. 11, no. 1, pp. 217–

, 2024.

Alsaedi, Mohammed, Fuad A. Ghaleb, Faisal

Saeed, Jawad Ahmad, and Mohammed

Alasli, ”Cyber threat intelligence-based ma-

licious URL detection model using ensemble

learning,” Sensors, vol. 22, no. 9, p. 3373,

Zuguo, Chen, Liu Yanglong, Chen

Chaoyang, Lu Ming, and Zhang Xuzhuo,

”Malicious URL Detection Based on Im-

proved Multilayer Recurrent Convolutional

Neural Network Model,” Security and

Communication Networks, 2021.

D. R. Patil and J. B. Patil, ”Feature-based

Malicious URL and Attack Type Detection

Using Multi-class Classification,” ISeCure,

vol. 10, no. 2, 2018.

Jiang, Jianguo, Jiuming Chen, Kim-Kwang

Raymond Choo, Chao Liu, Kunying Liu,

Min Yu, and Yongjian Wang, ”A deep

learning based online malicious URL and

DNS detection scheme,” in Security and Pri-

vacy in Communication Networks: 13th In-

ternational Conference, SecureComm 2017,

Niagara Falls, ON, Canada, pp. 438–448,

Springer, 2018.

W. Yang, W. Zuo, and B. Cui, ”Detecting

malicious URLs via a keyword-based con-

volutional gated-recurrent-unit neural net-

work,” IEEE Access, vol. 7, pp. 29891–29900,

Alshingiti, Zainab, Rabeah Alaqel, Jalal Al-

Muhtadi, Qazi Emad Ul Haq, Kashif Saleem,

and Muhammad Hamza Faheem, ”A deep

learning-based phishing detection system us-

ing CNN, LSTM, and LSTM-CNN,” Elec-

tronics, vol. 12, no. 1, p. 232, 2023.

Rafsanjani, Ahmad Sahban, Norshaliza Binti

Kamaruddin, Mehran Behjati, Saad Aslam,

Aaliya Sarfaraz, and Angela Amphawan,

”Enhancing Malicious URL Detection: A

Novel Framework Leveraging Priority Coeffi-

cient and Feature Evaluation,” IEEE Access,

D. R. Patil and J. B. Patil, ”Malicious URLs

detection using decision tree classifiers and

majority voting technique,” Cybernetics and

Information Technologies, vol. 18, no. 1, pp.

–29, 2018.

S. Kumi, C. Lim, and S. G. Lee, ”Malicious

URL detection based on associative classifi-

cation,” Entropy, vol. 23, no. 2, p. 182, 2021.

Peng, Yongfang, Shengwei Tian, Long

Yu, Yalong Lv, and Ruijin Wang, ”Mali-

cious URL recognition and detection using

attention-based CNN-LSTM,” KSII Trans-

actions on Internet and Information Systems

(TIIS), vol. 13, no. 11, pp. 5580–5593, 2019.

Yuan, Jianting, Guanxin Chen, Shengwei

Tian, and Xinjun Pei, ”Malicious URL detec-

tion based on a parallel neural joint model,”

IEEE Access, vol. 9, pp. 9464–9472, 2021.

Balogun, Abdullateef O., Kayode S. Ade-

wole, Muiz O. Raheem, Oluwatobi N.

Akande, Fatima E. Usman-Hamza, Modinat

A. Mabayoje, Abimbola G. Akintola, ”Im-

proving the phishing website detection using

empirical analysis of Function Tree and its

variants,” Heliyon, vol. 7, no. 7, 2021.

Rafsanjani, Ahmad Sahban, Norshaliza Binti

Kamaruddin, Hazlifah Mohd Rusli, and Mo-

hammad Dabbagh, ”Qsecr: Secure QR code

scanner according to a novel malicious URL

detection framework,” IEEE Access, 2023.

B. C. Ujah-Ogbuagu, O. N. Akande, and E.

Ogbuju, ”A hybrid deep learning technique

for spoofing website URL detection in real-

time applications,” Journal of Electrical Sys-

tems and Information Technology, vol. 11,

no. 1, p. 7, 2024.

Y. Freund and R. E. Schapire, “A decision-

theoretic generalization of on-line learning

and an application to boosting,” in Proceed-

ings of the Second European Conference on

Computational Learning Theory, pp. 23–37,

Springer, 1995.

T. Chen and C. Guestrin, “XGBoost: A scal-

able tree boosting system,” in Proceedings of

the 22nd ACM SIGKDD International Con-

ference on Knowledge Discovery and Data

Mining, pp. 785–794, ACM, 2016.

J. H. Friedman, “Greedy function approxi-

mation: A gradient boosting machine,” TheAnnals of Statistics, vol. 29, no. 5, pp. 1189–

, 2001.

Ke, G., Meng, Q., Finley, T., Wang, T., and

Yang, W. , “LightGBM: A highly efficient

gradient boosting decision tree,” in Proceed-ings of the 31st Conference on Neural Infor-

mation Processing Systems, pp. 3146–3154,

A. V. Dorogush, V. Ershov, and A. Gulin,

“CatBoost: A high-performance gradient

boosting library,” in Proceedings of the 2018Data Mining and Knowledge Discovery Con-

ference, pp. 1–10, 2018.

L. Prokhorenkova, G. Gusev, A. Vorobev,

A. V. Dorogush, and A. Gulin, “Cat-

Boost:

Unbiased boosting with cate-

gorical features,” in Advances in Neu-

ral

Information

Processing

Systems

(NIPS), vol. 31, 2018. [Online]. Avail-able:

https://proceedings.neurips.

cc/paper_files/paper/2018/file/

f5f8590cd58a54e94377e6ae2eded4d9-Paper.

pdf.

J. Friedman, T. Hastie, and R. Tibshirani,

“Additive logistic regression: A statistical

view of boosting,” The Annals of Statis-

tics, vol. 28, no. 2, pp. 337–407, 2000. DOI:

1214/aos/1016218223.

D. H. Wolpert, “Stacked generalization,”

Neural Networks, vol. 5, no. 2, pp. 241–259,

[Online]. Available: https://doi.

org/10.1016/S0893-6080(05)80023-1

A. K. Seewald, “How to make stacking bet-ter and faster while also taking care of an un-

known weakness,” in Proceedings of the 19th

International Conference on Machine Learn-

ing (ICML), 2002, pp. 554–561.

J. Sill, G. Takacs, L. Mackey, and D. Lin,

“Feature-weighted linear stacking,” in Ad-

vances in Neural Information Processing

Systems (NIPS), vol. 22, 2009.

E. Bauer and R. Kohavi, “An empirical com-

parison of voting classification algorithms:

Bagging, boosting, and variants,” Machine

Learning, vol. 36, no. 1, pp. 105–139, 1999.

[Online]. Available: https://doi.org/10.

/A:1007515423169

L. Breiman, “Bagging predictors,” Machine

Learning, vol. 24, no. 2, pp. 123–140, 1996.

L. I. Kuncheva, Combining Pattern Classi-

fiers: Methods and Algorithms. John Wiley

& Sons, 2004.

T. G. Dietterich, “Ensemble methods

in machine learning,” in International

Workshop on Multiple Classifier Systems

(MCS).

Springer, 2000, pp. 1–15.

[Online]. Available: https://doi.org/10.

/3-540-45014-9_1

Z.-H. Zhou, Ensemble Methods: Foundations

and Algorithms.

Chapman & Hall/CRC,

Tabular dataset ready for malicious URL

detection. [Online]. Available: https://

www.kaggle.com/datasets/pilarpieiro/

tabular-dataset-ready-for-malicious-url-detection

[Accessed: Dec. 2, 2024].

M. Sokolova and G. Lapalme, “A systematic

analysis of performance measures for clas-

sification tasks,” Information Processing &

Management, vol. 45, no. 4, pp. 427–437,

Jul. 2009. DOI: https://doi.org/10.1016/

j.ipm.2009.03.002.

S. Abad, H. Gholamy, and M. Aslani, “Clas-

sification of Malicious URLs Using Machine

Learning,” Sensors, vol. 23, no. 18, pp. 7760,

DOI: 10.3390/s23187760.

X. Do, C. Hoa Dinh Nguyen, and V. N.

Tisenko, “Malicious URL Detection Based

on Machine Learning,” International Journal

of Advanced Computer Science and Applica-

tions, vol. 11, no. 1, pp. 1–6, 2020.

T. Swetha, M. Seshaiah, K. L. Hemalatha,

S. V. N. Murthy, and M. Kumar, “Hybrid

Machine Learning Approach for Real-Time

Malicious URL Detection Using SOM-RMO

and RBFN with Tabu Search,” International

Journal of Advanced Computer Science and

Applications, vol. 15, no. 8, pp. 1–10, 2024.

Stacking and Voting-Based Boosting Ensembles for Robust Malicious URL Classification

Abstract

Author Biography

Dharmaraj Rajaram Patil, Department of Computer Engineering, R.C.Patel Institute of Technology, Shirpur, India

References

Authors

DOI:

Downloads

Published

How to Cite

Issue

Section

License

Developed By

Information