A Framework for Malicious Domain Names Detection using Feature Selection and Majority Voting Approach
Abstract
As cyber attacks become more sophisticated, identifying and mitigating bad domainnames has become critical to assuring the security of online environments. This paperpresents a framework for detecting malicious domain names using a feature selectionstrategy and a majority vote method. The suggested methodology begins with theextraction of important features from domain names and their related characteristics,followed by a rigorous feature selection procedure to determine the most discriminatingattributes. To accomplish feature selection, a variety of feature selection techniques areused, including chi-square statistics, information gain, gain ratio, and correlation-basedfeature selection, to analyse the value of each characteristic in distinguishing benign andmalicious domain names. In addition, a majority voting strategy is utilised to improvethe detection system’s overall accuracy and reliability by combining the predictions ofdifferent classifiers such as AdaBoost, logistic regression, k-nearest neighbours, naivebayes, and multilayer perceptron. The ensemble of classifiers is trained on the idealfeatures, yielding a complete and robust model capable of accurately recognising mali-cious domain names while minimising false positives. The proposed approach is evalu-ated against real-world examples of harmful domain names. The suggested frameworkemploying Chi-square feature selection and majority voting detects malicious domainnames with an accuracy of 99.44%, precision of 99.44%, recall of 99.44%, and f-measureof 99.44%. The use of feature selection and a majority voting technique improves thesystem’s adaptability and resilience in the face emerging cyber threats.References
Interisle malicious domain names statistics 4Q 2022. Available
online,https://www.cybercrimeinfocenter. org/malware-landscape-2023.
CSC domain security 2023 report. Available
online, https://www.cscdbs.com/assets/
pdfs/2023-Domain-Security-Report.pdf.
Zhao, Hong, Zhaobin Chang, Guangbin Bao,
and Xiangyan Zeng, Malicious domain names
detection algorithm based on N-gram. Jour-
nal of Computer Networks and Communica-
tions 2019.
Soleymani, Ali, and Fatemeh Arabgol, A
novel approach for detecting DGA-based
botnets in DNS queries using machine learn-ing techniques. Journal of Computer Networks and Communications, 2021, 1–13.
Yang, Luhui, Guangjie Liu, Weiwei Liu,
Huiwen Bai, Jiangtao Zhai, and Yuewei
Dai,Detecting Multielement Algorithmically
Generated Domain Names Based on Adap-
tive Embedding Model, Security and Com-
munication Networks, 2021, 1–20.
Chen, Shaojie, Bo Lang, Yikai Chen, and
Chong Xie, Detection of Algorithmically
Generated Malicious Domain Names with
Feature Fusion of Meaningful Word Segmen-
tation and N-Gram Sequences, Applied Sci-
ences, 13, no. 7,2023, 4406.
Wagan, Atif Ali, Qianmu Li, Zubair Za-
land, Shah Marjan, Dadan Khan Bozdar,
Aamir Hussain, Aamir Mehmood Mirza, and
Mehmood Baryalai, A Unified Learning Ap-
proach for Malicious Domain Name Detec-
tion, Axioms, 12, no. 5, 2023, 458.
Bilge, Leyla, Engin Kirda, Christopher
Kruegel, and Marco Balduzzi, Exposure:
Finding malicious domains using passive
DNS analysis, In Ndss, pp. 1–17, 2011.
Fan, Zhaoshan, Qing Wang, Haoran Jiao,
Junrong Liu, Zelin Cui, Song Liu, and Yuling
Liu, PUMD: a PU learning-based malicious
domain detection framework, Cybersecurity,
, no. 1, 2022, 1–22.
Yang, Luhui, Jiangtao Zhai, Weiwei Liu, Xi-
aopeng Ji, Huiwen Bai, Guangjie Liu, and
Yuewei Dai, Detecting word-based algorith-
mically generated domains using semantic
analysis, Symmetry, 11, no. 2, 2019, 176.
Shi, Yong, Gong Chen, and Juntao Li, Mali-
cious domain name detection based on ex-
treme machine learning, Neural Processing
Letters, 48,2018,1347–1357.
Fu, Yu, Lu Yu, Oluwakemi Hambolu, Ilker
Ozcelik, Benafsh Husain, Jingxuan Sun,
Karan Sapra, Dan Du, Christopher Tate
Beasley, and Richard R. Brooks, Stealthy do-
main generation algorithms, IEEE Transac-
tions on Information Forensics and Security,
, no. 6, 2017, 1430–1443.
Yun, Xiaochun, Ji Huang, Yipeng Wang,
Tianning Zang, Yuan Zhou, and Yongzheng
Zhang, Khaos: An adversarial neural net-
work DGA with high anti-detection ability,
IEEE transactions on information forensics
and security, 15, 2019,, 2225–2240.
Yang, Luhui, Guangjie Liu, Yuewei Dai,
Jinwei Wang, and Jiangtao Zhai, Detecting
stealthy domain generation algorithms using
heterogeneous deep neural network frame-
work, IEEE Access, 8, 2020,82876–82889.
Xu, Congyuan, Jizhong Shen, and Xin Du,
Detection method of domain names gener-
ated by DGAs based on semantic represen-
tation and deep neural network, Computers
& Security, 85, 2019,77–88.
Vinayakumar, R., K. P. Soman, and Praba-
haran Poornachandran, Detecting malicious
domain names using deep learning ap-
proaches at scale, Journal of Intelligent &
Fuzzy Systems, 34, no. 3, 2018,1355–1367.
Yang, Luhui, Guangjie Liu, Jinwei Wang,
Jiangtao Zhai, and Yuewei Dai, A seman-
tic element representation model for mali-
cious domain name detection, Journal of
Information Security and Applications, 66,
,103148.
Marques, Claudio, Benign and malicious do-
mains based on DNS logs, Mendeley Data,
V5, 2021, doi: 10.17632/623sshkdrz.5.
Hall M, Frank E, Holmes G, Pfahringer
B, Reutemann P, Witten IH, The WEKA
data mining software: an update, ACM
SIGKDD explorations newsletter, 2009, Nov
, 11(1),10–8.
Zhai Y, Song W, Liu X, Liu L, Zhao X,
A chi-square statistics based feature selec-
tion method in text classification, In 2018
IEEE 9th International conference on soft-
ware engineering and service science (IC-
SESS), 2018, Nov 23,pp. 160–163, IEEE.
Prasetiyo B, Muslim MA, Baroroh N, Eval-
uation of feature selection using information
gain and gain ratio on bank marketing clas-
sification using Naı̈ve bayes, In Journal of physics: conference series, 2021, Jun 1,Vol. 1918, No. 4, pp. 042153, IOP Publishing.
Qu K, Xu J, Hou Q, Qu K, Sun Y., Fea-
ture selection using Information Gain and de-
cision information in neighborhood decision
system, Applied Soft Computing, 2023, Mar
, 136,110100.
Hall, Mark A., Correlation-based feature se-
lection of discrete and numeric class machine
learning, 2000.
Patil, Dharmaraj R., Tareek M. Patte-
war, Vipul D. Punjabi, and Shailendra M.
Pardeshi, Detecting Fake Social Media Pro-
files Using the Majority Voting Approach,
EAI Endorsed Transactions on Scalable In-
formation Systems,2024.
Schapire RE., Explaining AdaBoost, In Em-
pirical Inference: Festschrift in Honor of
Vladimir N. Vapnik, 201,3 Oct 9, pp. 37–52,.
Berlin, Heidelberg: Springer Berlin Heidel-
berg.
Stoltzfus JC., Logistic regression: a brief
primer, Academic emergency medicine, 2011,
Oct, 18(10), 1099–104.
Peterson LE., K-nearest neighbor, Scholar-
pedia, 2009, Feb 21, 4(2),1883.
Rish, Irina., An empirical study of the naive
Bayes classifier, In IJCAI 2001 workshop on
empirical methods in artificial intelligence,
vol. 3, no. 22, pp. 41–46. 2001.
Tang, Jiexiong, Chenwei Deng, and Guang-
Bin Huang, Extreme learning machine for
multilayer perceptron, IEEE transactions on
neural networks and learning systems, 27, no.
, 2015, 809–821.
Ruta D, Gabrys B., Classifier selection for
majority voting, Information fusion, 2005,
Mar 1, 6(1), 63-81.
Patil, Dharmaraj R., Tareek M. Patte-
war, Vipul D. Punjabi, and Shailendra M.
Pardeshi, Detecting Fake Social Media Pro-
files Using the Majority Voting Approach,
EAI Endorsed Transactions on Scalable In-
formation Systems, 2024.
Patil, Dharmaraj R., and Tareek M. Patte-
war, Majority Voting and Feature Selection
Based Network Intrusion Detection System,
EAI Endorsed Transactions on Scalable In-
formation Systems 9, no. 6,2022: e6-e6.
Patil, Dharmaraj R., Fake news detection us-
ing majority voting technique, arXiv preprint
arXiv:2203.09936, 2022.
Patil, Dharmaraj R., and Jayantro B. Patil,
Malicious URLs detection using decision tree
classifiers and majority voting technique, Cy-
bernetics and Information Technologies 18,
no. 1, 2018: 11-29.
Sokolova M, Lapalme G., A systematic anal-
ysis of performance measures for classifica-
tion tasks, Information processing & man-
agement, 2009, Jul 1, 45(4), 427–37.
DOI:
https://doi.org/10.31449/inf.v48i3.5824Downloads
Additional Files
Published
How to Cite
Issue
Section
License
Authors retain copyright in their work. By submitting to and publishing with Informatica, authors grant the publisher (Slovene Society Informatika) the non-exclusive right to publish, reproduce, and distribute the article and to identify itself as the original publisher.
All articles are published under the Creative Commons Attribution license CC BY 3.0. Under this license, others may share and adapt the work for any purpose, provided appropriate credit is given and changes (if any) are indicated.
Authors may deposit and share the submitted version, accepted manuscript, and published version, provided the original publication in Informatica is properly cited.







